Hi. Following bug exists in the ipw2100 driver/firmware for years and Intel folks never responded to zillions bugzilla entries and forum notices in the internet with some patch or firmware update (although did request dmesg and debug info, and received them). ipw2100: Fatal interrupt. Scheduling firmware restart. I believe it is a firmware bug because after driver is unloaded and loaded back again wireless adapter usually starts working (for small amount of time though). My conspiracy feeling can suggest, that it may be kind of a force to buy a new one, or trivial error in the firmware, when it writes to the same place in the flash and essentially given cell became dead or whatever else. Intel folks, please fix this problem, I see no other way to force you to do this than to mark ipw2100 driver as broken, since that is what it is. Bug exists at least in .15 upto .24 kernels, just search above dmesg line. I cought it with 2.6.24-19-386 ubuntu kernel, 1.3 firmware version. lspci: 02:04.0 Network controller: Intel Corporation PRO/Wireless LAN 2100 3B Mini PCI Adapter (rev 04) Subsystem: Intel Corporation Samsung X10/P30 integrated WLAN Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 64 (500ns min, 8500ns max), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 11 Region 0: Memory at 90080000 (32-bit, non-prefetchable) [size=4K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=1 PME- dmesg is pretty usual. Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> diff --git a/drivers/net/wireless/Kconfig b/drivers/net/wireless/Kconfig index 9931b5a..c24fc6a 100644 --- a/drivers/net/wireless/Kconfig +++ b/drivers/net/wireless/Kconfig @@ -125,7 +125,7 @@ config PCMCIA_RAYCS config IPW2100 ...
You are pretty funny, actually. :) I think the bug should be fixed, but what makes _you_ think you can _force_ -- Greetings Michael. --
Maybe because I bought that adapter and it stopped working and Intel knows about this bug and does not fix it for years? -- Evgeniy Polyakov --
On Sun, 21 Sep 2008 21:23:17 +0400 so now you go from an occasional burp to having nothing at all. How about you run with this patch on your own machine only? or.. since you say a reload of the driver fixes it.. why don't you make a patch for the driver that does basically the actions of a reload automatically when the driver detects the issue? (and stick a WARN_ON in for good measure so that kerneloops.org can start tracking these burps) -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org --
And how else user can get attention to the problem which is not fixed by the vendor? We close our eyes and there is no problem, since we do not see it. I just brought a lamp: no user can see that essentially driver It stops after several seconds (or packets?). Sometimes (but rarely) it works several minutes, sometimes it fires above dmesg line and continues to work, sometimes it fires it for a while and then stops writing it, although driver does not send or receive anything (at least ifconfig counters do not change). Actually, I do not think it is a driver problem, since what it does is pretty much straightforward, but if you will tell me how else can we fix this issue, I will print it and glue near the window so this gotcha could be used with other problems. Or you can (as everyone else who do) just said that this is damn wrong and forget about problem for the next several years. -- Evgeniy Polyakov --
On Sun, 21 Sep 2008 22:28:38 +0400 again.. so how about you detect this condition and do, in the driver code, the equivalent of rmmod/insmod to the hardware. I'm sure people who have the hardware would appreciate that type of patch a lot more than the one you sent out. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org --
I guess it is your way of "middle finger" to all the IPW2100 customers who try to use it on a Linux machine. Thanks Wei --
On Sun, 21 Sep 2008 14:52:37 -0400 if suggesting a workaround is giving the middle finger in your mind, then I don't think it's worth my time to discuss this further with you. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org --
Which does not have access to the firmware... Which IMO is failing and Reset task does efectively ipw2100_up(), so the difference is power cycles over the PCI bus and enable/disable/request commands. Like this stuff: /* We disable the RETRY_TIMEOUT register (0x41) to keep * PCI Tx retries from interfering with C3 CPU state */ pci_read_config_dword(pci_dev, 0x40, &val); if ((val & 0x0000ff00) != 0) pci_write_config_dword(pci_dev, 0x40, val & 0xffff00ff); I do remember I had a tibet monk course of decoding ipw2100 PCI config address space, just need to find my kimono. Do you want me to implement ipw2100 driver as a big work structure which will run ipw2100_init()/wait/ipw2100_exit() in a loop? And that will be the fix suggested by Intel? That would explain a lot. P.S. And some people tell that asking for bug bisection is a hard pressure on user. Vendor has to ask him to fix bug himself instead, and that will be a solution! Getting the fact, that rmmod/insmod does not always fix the problem (but most of the time for a short period of time), I again want to point, that it looks like a firmware problem related to some inner timings. You ask me to fix the driver and do not even listen to what I said previously and do not get that into account and analyze. -- Evgeniy Polyakov --
I think what Arjan is saying is that it would be better to put pressure on the responsible folks (I don't think Arjan is anywhere near them at all) if you'd put in a WARN_ON() for this error and that would make the top entry on kerneloops.org all the time... And additionally put in a workaround for yourself for now. And can we keep the flames off this list please? That comment from Wei Weng was absolutely uncalled for, and inciting a flamewar (as you have already blogged) was not really productive either. johannes
Hi.
As I pointed, I can rewrite the whole driver's initialization process,
so that it looked like init/wait/exit loop, which can be processed at
the module load and when fatal interrupt fires. Do this a fix? This is
not even a remotely workaround. We can just add
rmmod/modprobe/ifdown/ifup to the crontab job. Another users reported in
bugzilla that they needed to reboot a machine to make card working
If we will keep silence, no one will notice that problem exists.
I do hope this will result in a progress. Arjan, do you aggree to add
this patch to the current tree?
diff --git a/drivers/net/wireless/ipw2100.c b/drivers/net/wireless/ipw2100.c
index 19a401c..9a7b64c 100644
--- a/drivers/net/wireless/ipw2100.c
+++ b/drivers/net/wireless/ipw2100.c
@@ -206,6 +206,8 @@ MODULE_PARM_DESC(disable, "manually disable the radio (default 0 [radio on])");
static u32 ipw2100_debug_level = IPW_DL_NONE;
+static int ipw2100_max_fatal_ints = 10;
+
#ifdef CONFIG_IPW2100_DEBUG
#define IPW_DEBUG(level, message...) \
do { \
@@ -3174,6 +3176,10 @@ static void ipw2100_irq_tasklet(struct ipw2100_priv *priv)
if (inta & IPW2100_INTA_FATAL_ERROR) {
printk(KERN_WARNING DRV_NAME
": Fatal interrupt. Scheduling firmware restart.\n");
+ WARN_ON(1);
+
+ BUG_ON(ipw2100_max_fatal_ints-- <= 0);
+
priv->inta_other++;
write_register(dev, IPW_REG_INTA, IPW2100_INTA_FATAL_ERROR);
--
Evgeniy Polyakov
--
On Sun, 21 Sep 2008 23:38:09 +0400 BUG_ON in interrupt context is just extremely hostile, since it means the box is dead. also I would suggest using WARN_ON_ONCE() -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org --
Well, I actually wanted to have a bug there because of it, but now I
think that annoying repeated warning is enough to bring attention to the
problem by putting bug information into some magic special place called
kerneloops collection.
Consider for inclusing for the upcoming kernel to get wider
notifications. Yes, it is not a bugfix, I know.
diff --git a/drivers/net/wireless/ipw2100.c b/drivers/net/wireless/ipw2100.c
index 19a401c..6599211 100644
--- a/drivers/net/wireless/ipw2100.c
+++ b/drivers/net/wireless/ipw2100.c
@@ -206,6 +206,9 @@ MODULE_PARM_DESC(disable, "manually disable the radio (default 0 [radio on])");
static u32 ipw2100_debug_level = IPW_DL_NONE;
+static int ipw2100_max_fatal_ints = 10;
+module_param(ipw2100_max_fatal_ints, int, 0644);
+
#ifdef CONFIG_IPW2100_DEBUG
#define IPW_DEBUG(level, message...) \
do { \
@@ -3174,6 +3177,9 @@ static void ipw2100_irq_tasklet(struct ipw2100_priv *priv)
if (inta & IPW2100_INTA_FATAL_ERROR) {
printk(KERN_WARNING DRV_NAME
": Fatal interrupt. Scheduling firmware restart.\n");
+
+ WARN_ON(ipw2100_max_fatal_ints-- >= 0);
+
priv->inta_other++;
write_register(dev, IPW_REG_INTA, IPW2100_INTA_FATAL_ERROR);
--
Evgeniy Polyakov
--
On Mon, 22 Sep 2008 00:20:57 +0400 are you more interested in bringing attention than finding something that makes the driver work ? I sort of am getting that impression and still more complex than needed; a WARN_ON_ONCE() will be enough. --
I do think that it can not be fixed without serious intervention of the Intel (hardware) folks, since bug exists more than 4 years in two firmwares and lots of very different driver versions and was reproduced even on 2.4 kernel. I will experiment with reloading issues as Alan suggested and to add/remove more surgery into initialization process to be allowed to 'workaround' the issue, since it looks noone else will. But that's definitely not a fix and in my personal workaround's 10 That allows to dump whatever number of warnings you want. The more we have, the louder will be customers scream. -- Evgeniy Polyakov --
On Mon, 22 Sep 2008 00:57:06 +0400 artificially increasing numbers isn't going to do that; it just shows you're more interested in making a stink than in getting something -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org --
As practice shows, I'm the only one who is interested in getting something improved, and Intel, as we see right now, is not interested in it at all, since you ask me not only decrease error verbosity, but also do not work towards fixing the bug by trying to understand where it lives. -- Evgeniy Polyakov --
On Mon, 22 Sep 2008 01:05:55 +0400 I did no such thing and you know it. I'm sorry, I'm not going to waste time on this if you keep acting this dishonest; welcome to my mail filter... -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org --
You are not right. I had totaly disfunctional Intel driver on two laptops and reported about issue to Intel. Yes it took time, they took all debugs and went coma mode (i was thinking that), but suddently i got mail from them, and next kernel/firmware release worked for me flawlessly. So they did perfect job. Don't be negative and prepare yourself for giving long debug outputs. Patience and only patience. --
Just to be clear, did it take 4 years? :) Anyway, I already made conclusions, as probably others: I will experiment with different 'workarounds' for this bug, maybe I will succeed, maybe Intel will decided to fix it, maybe LHC will crash the world. Verbose warning about the bug was frowned upon, so its up to uses to make a progress here... -- Evgeniy Polyakov --
Any bugzilla entry? I cannot find on http://www.intellinuxwireless.org/bugzilla/ anything about this bug. I submit two reports in my case, one in kernel bugzilla, one in intel linux wireless project.. --
Lucky you :) -- Evgeniy Polyakov --
as Arjan and Alan pointed out already, WARN_ON_ONCE is enough and I agree with them. Just to make this perfectly clear, this is with my community hat on. Please send a proper patch with a simple WARN_ON_ONCE and I am happy to sign off on it. Regards Marcel --
I really do not care about if there is warning at all, I just want that
bug to be fixed. And a we can see, something started to change, and that's
probably a good sign. I glad there is a result. I will check d3 states
tomorrow. Attached patch if you think it is yet needed.
diff --git a/drivers/net/wireless/ipw2100.c b/drivers/net/wireless/ipw2100.c
index 19a401c..637dc05 100644
--- a/drivers/net/wireless/ipw2100.c
+++ b/drivers/net/wireless/ipw2100.c
@@ -3174,16 +3174,18 @@ static void ipw2100_irq_tasklet(struct ipw2100_priv *priv)
if (inta & IPW2100_INTA_FATAL_ERROR) {
printk(KERN_WARNING DRV_NAME
": Fatal interrupt. Scheduling firmware restart.\n");
+
priv->inta_other++;
write_register(dev, IPW_REG_INTA, IPW2100_INTA_FATAL_ERROR);
read_nic_dword(dev, IPW_NIC_FATAL_ERROR, &priv->fatal_error);
- IPW_DEBUG_INFO("%s: Fatal error value: 0x%08X\n",
- priv->net_dev->name, priv->fatal_error);
-
read_nic_dword(dev, IPW_ERROR_ADDR(priv->fatal_error), &tmp);
- IPW_DEBUG_INFO("%s: Fatal error address value: 0x%08X\n",
- priv->net_dev->name, tmp);
+
+ printk(KERN_WARNING "%s: Fatal error value: 0x%08X, "
+ "address: 0x%08X, inta: 0x%08lX\n",
+ priv->net_dev->name, priv->fatal_error, tmp,
+ (unsigned long)inta & IPW_INTERRUPT_MASK);
+ WARN_ON_ONCE(1);
/* Wake up any sleeping jobs */
schedule_reset(priv);
--
Evgeniy Polyakov
--
But if Intel don't care then you can scream all you like 8) A WARN_ON_ONCE is sufficient to capture an idea of how many people it is effecting and maybe to figure out what the trigger is from their reports, at that point there is some chance to get it fixed (especially if its remotely triggerable ;)) Alan --
Well, redhat, suse and ubuntu bugzillas happend to be not enough. Why do you believe a single warning at a new place will be? or couple of tens or whatever else? If it cares, it cares. If it does not... I attracted vendor's attention, vendor told me to fix it myself and to create a patch to fill an entry in another 'bugzilla', so that vendor could get results and probably decide to walk down from the cloud and fix it. So, if they do not care, I do not care about their care. That's the deal. I will try to find a workaround, even if it is a real crap, fortunately other users will not strike this bug too frequently. -- Evgeniy Polyakov --
Evgeniy, you're bordering on being an asshole, if not actually being one. If you behaved this way for a bug I was responsible for, I would absolutely ignore you until you settled down and started to behave more reasonably. You're acting like a bomb which is about to explode, which is probably why the actual Intel maintainers for this driver don't want to touch you with a ten foot pole. You're being volatile and extremely unpleasant to interact with about this issue. The Intel folks replying to you right now are general Intel linux folks who are trying to help you, not the driver maintainers who can look into the firmware and attack that angle. So give them A FUCKING BREAK! Getting the OOPS to kerneloops.org is the way forward and will help your cause, whether you believe it or not. --
Out of curiosity, what's worse: being an asshole and pretend to be good That's the main point: 'until you started to behave more reasonably'. For example filling another bug in rh/suse/ubuntu bugzilla? Put yourself to the user's place, and suddenly picture changes dramatically. We got some progress on this bug, at least there is direct suggestion from Matthew about power state, if it will fix the issue, I think it is a good deal: one bug fix for lot of users for the mail in the killfile and a worsened 'reputation'. I provded a patch like Arjan wanted, and it can only change something because of all this talks I started being an asshole. In my opinion. Maybe there were some other ways around, but it looks like being a provocative is the only way to get to the cloud. Who knows :) -- Evgeniy Polyakov --
Has it occurred to you that YOU have a problem on YOUR maschine, and that your patch would kill wireless for all the people who have the hardware on working systems? My experience was somewhat like Denys' except I got no notice, I just found that after an update the wireless worked solidly, and continued to do so until that laptop because obsolete and slow, and went to live with one of the It would be good to gather data rather than claim it doesn't work, because for My experience with laptops has been that you fiddle with power saving, and more, and more, until you find the tricks which make the laptop save power by disabling something you need, like network or display. At least that's been both my practice and observation, that not every machine responds well to every power saving trick. Have you checked for a BIOS update for the machine? Tried disabling all power saving settings and seeing if that changes the problem? I would normally assume you have, but you seem convinced that the bug is in the firmware and you're going to get it fixed. It may be a firmware bug, but if something in your system is triggering it, and most people don't have the problem, you might investigate a solution other than beating on Intel. -- Bill Davidsen <davidsen@tmr.com> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot --
[Evgeniy Polyakov - Sun, Sep 21, 2008 at 11:38:09PM +0400] | Hi. | | On Sun, Sep 21, 2008 at 09:14:04PM +0200, Johannes Berg (johannes@sipsolutions.net) wrote: | > > Do you want me to implement ipw2100 driver as a big work structure | > > which will run ipw2100_init()/wait/ipw2100_exit() in a loop? | > > And that will be the fix suggested by Intel? That would explain a lot. | > | > I think what Arjan is saying is that it would be better to put pressure | > on the responsible folks (I don't think Arjan is anywhere near them at | | Both maintainers were added to the copy list. | | > all) if you'd put in a WARN_ON() for this error and that would make the | > top entry on kerneloops.org all the time... And additionally put in a | > workaround for yourself for now. | | As I pointed, I can rewrite the whole driver's initialization process, | so that it looked like init/wait/exit loop, which can be processed at | the module load and when fatal interrupt fires. Do this a fix? This is | not even a remotely workaround. We can just add | rmmod/modprobe/ifdown/ifup to the crontab job. Another users reported in | bugzilla that they needed to reboot a machine to make card working | again. I'm not sure that user tried to do a rmmod/modprobe though. | | > And can we keep the flames off this list please? That comment from Wei | > Weng was absolutely uncalled for, and inciting a flamewar (as you have | > already blogged) was not really productive either. | | If we will keep silence, no one will notice that problem exists. | | I do hope this will result in a progress. Arjan, do you aggree to add | this patch to the current tree? | | diff --git a/drivers/net/wireless/ipw2100.c b/drivers/net/wireless/ipw2100.c | index 19a401c..9a7b64c 100644 | --- a/drivers/net/wireless/ipw2100.c | +++ b/drivers/net/wireless/ipw2100.c | @@ -206,6 +206,8 @@ MODULE_PARM_DESC(disable, "manually disable the radio (default 0 [radio on])"); | | static u32 ipw2100_debug_level = IPW_DL_NONE; | | ...
The only reason for this change is to make a mark at the kerneloops. I.e. users know, there is a bug. Developers know, there is a bug. Everyone knows that there is a bug, but until it is at the special place we look to each other just like there is no bug. Here are dumps for example: http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=245 Bug existed even with 1.2 firmware and .11 kernel. Intel, that's a great marketing slogan: stability everywhere! -- Evgeniy Polyakov --
[Evgeniy Polyakov - Mon, Sep 22, 2008 at 12:26:56AM +0400] | On Mon, Sep 22, 2008 at 12:05:18AM +0400, Cyrill Gorcunov (gorcunov@gmail.com) wrote: | > Since it's that serious maybe we should change | > | > IPW_DEBUG_INFO("%s: Fatal error value: 0x%08X\n", | > priv->net_dev->name, priv->fatal_error); | > | > to printk(KERN_WARN)? And here is why - as I see now we can't say what | > exactly is wrong - Evgeniy said he has a suspicious about firmware so | > this WARNS will be collected by Arjan thru kerneloops and we could not | > ask users to change debug level and repost problem - oops will have it | > by default - and if it really firmware problem - firmware engineers could | > find this _additional_ info usefull and resolve it (probably). | | The only reason for this change is to make a mark at the kerneloops. | I.e. users know, there is a bug. Developers know, there is a bug. | Everyone knows that there is a bug, but until it is at the special place | we look to each other just like there is no bug. | | Here are dumps for example: | http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=245 | | Bug existed even with 1.2 firmware and .11 kernel. | Intel, that's a great marketing slogan: stability everywhere! | | -- | Evgeniy Polyakov | yes Evgeniy - all could know that but this register info could help firmware engineers to distinguish problems (without additional efforts like ask users to pass debug argument - kerneloops will have it by default) if there not only one exist. I mean I don't think anyone would reject additional info about problem ever :) - Cyrill - --
Agreed.
diff --git a/drivers/net/wireless/ipw2100.c b/drivers/net/wireless/ipw2100.c
index 19a401c..36cdd57 100644
--- a/drivers/net/wireless/ipw2100.c
+++ b/drivers/net/wireless/ipw2100.c
@@ -206,6 +206,9 @@ MODULE_PARM_DESC(disable, "manually disable the radio (default 0 [radio on])");
static u32 ipw2100_debug_level = IPW_DL_NONE;
+static int ipw2100_max_fatal_ints = 10;
+module_param(ipw2100_max_fatal_ints, int, 0644);
+
#ifdef CONFIG_IPW2100_DEBUG
#define IPW_DEBUG(level, message...) \
do { \
@@ -3174,16 +3177,21 @@ static void ipw2100_irq_tasklet(struct ipw2100_priv *priv)
if (inta & IPW2100_INTA_FATAL_ERROR) {
printk(KERN_WARNING DRV_NAME
": Fatal interrupt. Scheduling firmware restart.\n");
+
+ printk(KERN_WARNING DRV_NAME ": INTA: 0x%08lX\n",
+ (unsigned long)inta & IPW_INTERRUPT_MASK);
+
priv->inta_other++;
write_register(dev, IPW_REG_INTA, IPW2100_INTA_FATAL_ERROR);
read_nic_dword(dev, IPW_NIC_FATAL_ERROR, &priv->fatal_error);
- IPW_DEBUG_INFO("%s: Fatal error value: 0x%08X\n",
+ printk(KERN_WARNING "%s: Fatal error value: 0x%08X\n",
priv->net_dev->name, priv->fatal_error);
read_nic_dword(dev, IPW_ERROR_ADDR(priv->fatal_error), &tmp);
- IPW_DEBUG_INFO("%s: Fatal error address value: 0x%08X\n",
+ printk(KERN_WARNING "%s: Fatal error address value: 0x%08X\n",
priv->net_dev->name, tmp);
+ WARN_ON(ipw2100_max_fatal_ints-- >= 0);
/* Wake up any sleeping jobs */
schedule_reset(priv);
--
Evgeniy Polyakov
--
Try putting it into D3 counting to 10 and powering it back up. Thats about as close as you can get to pulling the plug when it hangs. Alan --
I will experiment with this, thanks Alan. Unfortunately my machine builds this only updated driver for about 10 minutes, so results will appear not too quickly. I will start tests tomorrow. -- Evgeniy Polyakov --
I made several experimetns with power states in reset handler, like put to d3 (hot), disable device, save/resetore states. Fatal interrupts continue to fire with essentially the same rate. The same error address does not always contain the same error value, but frequently it is finit small set. Here are some data: [41773.200686] ipw2100: Fatal interrupt. Scheduling firmware restart. [41773.200707] eth1: Fatal error value: 0x500185B8, address: 0x08004501, inta: 0x40000000 [41773.200810] ipw2100 0000:02:04.0: PCI INT A disabled [41773.203110] ipw2100: IRQ INTA == 0xFFFFFFFF [41773.224446] ipw2100: IRQ INTA == 0xFFFFFFFF [41773.245781] ipw2100: IRQ INTA == 0xFFFFFFFF [41773.249360] ipw2100 0000:02:04.0: enabling device (0000 -> 0002) [41773.249384] ipw2100 0000:02:04.0: PCI INT A -> Link[C0C8] -> GSI 11 (level, low) -> IRQ 11 [41773.249426] ipw2100 0000:02:04.0: restoring config space at offset 0x1 (was 0x2900002, writing 0x2900006) That is quite harmless, since interrupt handler just sees that device is dissapearing. This brought me to think more about interrupt processing (irq handler and related tasklet), and I found races between interrupt tasklet, ipw2100_wx_event_work() handler, reset task and probably others. Register access in some cases are proteceted by lock (interrupt handler), and in some cases is not (all others). Although every user first disables interrupts, but it can be handled right now and scheduled tasklet already. Also priv->status field is frequently accessed and modified with and without locks. This may be harmless, but still a red flag. Another data about the same failed address: eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x5000CEE4, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x5000CEE4, address: 0x61C00000, ...
I don't know if it is for this bug or a different one, but Matthew Garrett seem to have some pending patches. At least that is what he told me at PlumbersConf. Lets see if these patches do help. And please follow up with Arjan's suggestion and put a WARN_ON in the upstream code instead of waving CONFIG_BROKEN around. Regards Marcel --
Hi Marcel. I expect it is something new, since this bug exists at least from the 1.2 firmware version and .11 kernel. It was also reproduced (long ago though) on 2.4. -- Evgeniy Polyakov --
The fix I had for this was actually for ipw2200, but it ought to be applicable for 2100 as well. The ideal fix is probably to ensure that ipw*_down D3s the card and *_up D0s it, which brings enhanced runtime power saving and also has the nice side effect of actually resetting the damned POS in error cases. -- Matthew Garrett | mjg59@srcf.ucam.org --
Try D3ing the chip in the firmware restart code. Yes, it's retarded. -- Matthew Garrett | mjg59@srcf.ucam.org --
Thank you, I will start tests tomorrow. -- Evgeniy Polyakov --
I gotta admit, those "firmware restarts" were pretty annoying, and I'd always wondered why Intel themselves couldn't be bothered to fix 'em. -Kenny -- Kenneth R. Crudup Sr. SW Engineer, Scott County Consulting, Los Angeles O: 3630 S. Sepulveda Blvd. #138, L.A., CA 90034-6809 (888) 454-8181 --
