Re: ATA ACPI (was Re: Linux 2.6.21-rc5)

Previous thread: [PATCH] tifm_sd: add missing \n by Daniel Drake on Sunday, March 25, 2007 - 3:24 pm. (2 messages)

Next thread: Re: [patch] add file position info to proc by Neil Brown on Sunday, March 25, 2007 - 5:05 pm. (1 message)
From: Linus Torvalds
Date: Sunday, March 25, 2007 - 4:08 pm

There's various fixes here, ranging from some architecture updates (ia64, 
ARM, MIPS, SH, Sparc64) to KVM, networking and network drivers.

And random one-liners.

But probably more important, and likely much more visible to most people 
is the fixes for the fallout from the hrtimers and no-HZ changes, and some 
of the ACPI regressions.

Those timer changes ended up much more painful than anybody wished for, 
but big thanks to Thomas Gleixner for being on it like a weasel on a dead 
rat, and the regression list has kept shrinking.

So if you have reported a regression in the 2.6.21-rc series, please check 
2.6.21-rc5, and update your report as appropriate (whether fixed or "still 
problems with xyzzy").

		Linus

---
Adrian Bunk (3):
      X86_P4_CLOCKMOD must select CPU_FREQ_TABLE
      [X25] x25_forward_call(): fix NULL dereferences
      drivers/video/s3fb.c: fix a use-before-check

Akira Iguchi (1):
      drivers/ata/Kconfig: PATA_SCC depends on wrong platform

Alan Stern (1):
      usblp: quirk flag and device entry for Seiko Epson M129C printer

Alan Tyson (1):
      [CIFS] reset mode when client notices that ATTR_READONLY is no longer set

Alessandro Zummo (1):
      pata_ixp4xx_cf: fix interrupt

Alexandr Andreev (1):
      x86-64: wire up compat sched_rr_get_interval(2)

Alexey Dobriyan (1):
      [NET]: Copy mac_len in skb_clone() as well

Alexey Starikovskiy (1):
      ACPI: resolve HP nx6125 S3 immediate wakeup regression

Andi Kleen (4):
      x86-64: Update defconfig
      i386: Update defconfig
      i386: Enforce GPLness of VMI ROM
      x86: Export _proxy_pda for gcc 4.2

Andrew Johnson (1):
      swsusp: fix suspend when console is in VT_AUTO+KD_GRAPHICS mode

Andrew Morton (2):
      machzwd warning fix
      "ext[34]: EA block reference count racing fix" performance fix

Andy Isaacson (1):
      fix read past end of array in md/linear.c

Ankita Garg (1):
      oom fix: prevent oom from killing a process with children/sibling ...
From: Thomas Gleixner
Date: Monday, March 26, 2007 - 1:55 am

Why certainly ! I caused them, so I have to fix them. There are still a

This fix from John Stultz is still missing:

http://lkml.org/lkml/2007/3/22/287

It's in Andrews queue already and waits to be sent to you.

	tglx



-

From: Bob Tracy
Date: Monday, March 26, 2007 - 5:25 am

In summary, that fix is a workaround to allow the acpi_pm clocksource
to be selected instead of the pit clocksource, thereby allowing my
Dell laptop with the PIIX4 bug to boot.  Other apic, clocksource, etc.
patches that were included in -rc5 fixed the problem that caused the
boot process to hang when the pit clocksource was selected, as I
suspected would be the case :-).

Per John's message in the above URL, while the fix is no longer needed
for allowing the laptop to boot, it's probably still "a good thing" to
allow a better clocksource to be selected.

-- 
-----------------------------------------------------------------------
Bob Tracy                   WTO + WIPO = DMCA? http://www.anti-dmca.org
rct@frus.com
-----------------------------------------------------------------------
-

From: Thomas Gleixner
Date: Monday, March 26, 2007 - 5:30 am

Yes. The read three times pmtimer is faster and more reliable than the
PIT.

	tglx



-

From: Thomas Gleixner
Date: Monday, March 26, 2007 - 2:21 am

The current sysfs support of clockevents does not obey the "only one
value per file" rule.

The real fix is not 2.6.21 material. Therefor remove the sysfs support
for now.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 67932ea..76212b2 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -274,72 +274,3 @@ void clockevents_notify(unsigned long reason, void *arg)
 }
 EXPORT_SYMBOL_GPL(clockevents_notify);
 
-#ifdef CONFIG_SYSFS
-
-/**
- * clockevents_show_registered - sysfs interface for listing clockevents
- * @dev:	unused
- * @buf:	char buffer to be filled with clock events list
- *
- * Provides sysfs interface for listing registered clock event devices
- */
-static ssize_t clockevents_show_registered(struct sys_device *dev, char *buf)
-{
-	struct list_head *tmp;
-	char *p = buf;
-	int cpu;
-
-	spin_lock(&clockevents_lock);
-
-	list_for_each(tmp, &clockevent_devices) {
-		struct clock_event_device *ce;
-
-		ce = list_entry(tmp, struct clock_event_device, list);
-		p += sprintf(p, "%-20s F:%04x M:%d", ce->name,
-			     ce->features, ce->mode);
-		p += sprintf(p, " C:");
-		if (!cpus_equal(ce->cpumask, cpu_possible_map)) {
-			for_each_cpu_mask(cpu, ce->cpumask)
-				p += sprintf(p, " %d", cpu);
-		} else {
-			/*
-			 * FIXME: Add the cpu which is handling this sucker
-			 */
-		}
-		p += sprintf(p, "\n");
-	}
-
-	spin_unlock(&clockevents_lock);
-
-	return p - buf;
-}
-
-/*
- * Sysfs setup bits:
- */
-static SYSDEV_ATTR(registered, 0600,
-		   clockevents_show_registered, NULL);
-
-static struct sysdev_class clockevents_sysclass = {
-	set_kset_name("clockevents"),
-};
-
-static struct sys_device clockevents_sys_device = {
-	.id	= 0,
-	.cls	= &clockevents_sysclass,
-};
-
-static int __init clockevents_sysfs_init(void)
-{
-	int error = sysdev_class_register(&clockevents_sysclass);
-
-	if (!error)
-		error = ...
From: Thomas Gleixner
Date: Tuesday, March 27, 2007 - 12:08 am

The clockevents / tick management code expects an error value, when the
event is already expired. hpet_next_event() returns 1 in that case.

Fix it to return the proper -ETIME error code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

diff --git a/arch/i386/kernel/hpet.c b/arch/i386/kernel/hpet.c
index f3ab61e..76afea6 100644
--- a/arch/i386/kernel/hpet.c
+++ b/arch/i386/kernel/hpet.c
@@ -197,7 +197,7 @@ static int hpet_next_event(unsigned long delta,
 	cnt += delta;
 	hpet_writel(cnt, HPET_T0_CMP);
 
-	return ((long)(hpet_readl(HPET_COUNTER) - cnt ) > 0);
+	return ((long)(hpet_readl(HPET_COUNTER) - cnt ) > 0) ? -ETIME : 0;
 }
 
 /*


-

From: Ingo Molnar
Date: Monday, March 26, 2007 - 2:25 am

Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo
-

From: Greg KH
Date: Monday, March 26, 2007 - 11:57 am

Acked-by: Greg Kroah-Hartman <gregkh@suse.de>

Thanks Thomas for doing this.

greg k-h
-

From: Ingo Molnar
Date: Monday, March 26, 2007 - 1:31 am

here's a new v2.6.20 -> v2.6.21 forcedeth.c regression:

in the last week or so i've been seeing sporadic under-load forcedeth.c 
crashes (see the full oops further below):

 eth1: too many iterations (6) in nv_nic_irq.
 Unable to handle kernel NULL pointer dereference at 0000000000000088 RIP: 
 [<ffffffff80404587>] nv_tx_done+0xf4/0x1cf

this is line 1906 of drivers/net/forcedeth.c:

    np->stats.tx_bytes += np->get_tx_ctx->skb->len;

struct sk_buff's len field is at offset 88, so np->get_tx_ctx->skb is 
NULL. That is an 'impossible' scenario for tx descriptors here - the tx 
ring descriptors are always set up with a valid skb (and a valid dma 
address), and their completion is serialized via np->lock.

these crashes are almost instant on the .21-rc5-rt kernel, but extremely 
sporadic on the upstream kernel and needed very high networking loads to 
trigger. Today i found a good way to trigger it almost instantly on 
upstream kernels too: apply the debug patch attached further below and 
do:

	echo 100 > /proc/sys/kernel/panic

that will inject 100 artificial 'too many iterations' failures and 
provokes a TX timeout - which TX timeout will crash. (i've used a 
dual-core Athlon64 system in this test)

my first quick guess was to extend np->priv locking to the whole of 
nv_start_xmit/nv_start_xmit_optimized - while that appeared to make the 
crash a bit less likely, it did not prevent it. So there must be some 
other, more fundamental problem be left as well. At first glance the SMP 
locking looks OK, so maybe the ring indices are messed up somehow and we 
got into a 'ring head bites the tail' scenario?

i can provide more info if needed.

	Ingo

-------------->
eth1: too many iterations (6) in nv_nic_irq.
Unable to handle kernel NULL pointer dereference at 0000000000000088 RIP: 
 [<ffffffff80404587>] nv_tx_done+0xf4/0x1cf
PGD 34d03067 PUD 34d02067 PMD 0 
Oops: 0000 [1] PREEMPT SMP 
CPU 1 
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.21-rc5 ...
From: Ingo Molnar
Date: Monday, March 26, 2007 - 1:39 am

to be specific, the patch below is what i tried - but it didnt 
completely fix the crash.

	Ingo

---
 drivers/net/forcedeth.c |   15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

Index: linux/drivers/net/forcedeth.c
===================================================================
--- linux.orig/drivers/net/forcedeth.c
+++ linux/drivers/net/forcedeth.c
@@ -1650,9 +1650,10 @@ static int nv_start_xmit(struct sk_buff 
 			   ((skb_shinfo(skb)->frags[i].size & (NV_TX2_TSO_MAX_SIZE-1)) ? 1 : 0);
 	}
 
+	spin_lock_irq(&np->lock);
+
 	empty_slots = nv_get_empty_tx_slots(np);
 	if (unlikely(empty_slots <= entries)) {
-		spin_lock_irq(&np->lock);
 		netif_stop_queue(dev);
 		np->tx_stop = 1;
 		spin_unlock_irq(&np->lock);
@@ -1718,8 +1719,6 @@ static int nv_start_xmit(struct sk_buff 
 		tx_flags_extra = skb->ip_summed == CHECKSUM_PARTIAL ?
 			 NV_TX2_CHECKSUM_L3 | NV_TX2_CHECKSUM_L4 : 0;
 
-	spin_lock_irq(&np->lock);
-
 	/* set tx flags */
 	start_tx->flaglen |= cpu_to_le32(tx_flags | tx_flags_extra);
 	np->put_tx.orig = put_tx;
@@ -1766,9 +1765,10 @@ static int nv_start_xmit_optimized(struc
 			   ((skb_shinfo(skb)->frags[i].size & (NV_TX2_TSO_MAX_SIZE-1)) ? 1 : 0);
 	}
 
+	spin_lock_irq(&np->lock);
+
 	empty_slots = nv_get_empty_tx_slots(np);
 	if (unlikely(empty_slots <= entries)) {
-		spin_lock_irq(&np->lock);
 		netif_stop_queue(dev);
 		np->tx_stop = 1;
 		spin_unlock_irq(&np->lock);
@@ -1846,8 +1846,6 @@ static int nv_start_xmit_optimized(struc
 			start_tx->txvlan = 0;
 	}
 
-	spin_lock_irq(&np->lock);
-
 	/* set tx flags */
 	start_tx->flaglen |= cpu_to_le32(tx_flags | tx_flags_extra);
 	np->put_tx.ex = put_tx;
@@ -3484,6 +3482,7 @@ static void nv_do_nic_poll(unsigned long
 	struct net_device *dev = (struct net_device *) data;
 	struct fe_priv *np = netdev_priv(dev);
 	u8 __iomem *base = get_hwbase(dev);
+	unsigned long flags;
 	u32 mask = 0;
 
 	/*
@@ -3519,7 +3518,7 @@ static void nv_do_nic_poll(unsigned long
 ...
From: Ingo Molnar
Date: Monday, March 26, 2007 - 1:58 am

the patch below works the crash around. It does not seem to be a 'tx 
ring head bits the tail' scenario:

 get_tx: 55, put_tx: 57
 get_tx: 80, put_tx: 86
 get_tx: 88, put_tx: 97
 get_tx: 97, put_tx: 109
 get_tx: 97, put_tx: 109
 get_tx: 111, put_tx: 117
 get_tx: 117, put_tx: 125
 get_tx: 127, put_tx: 137
 get_tx: 137, put_tx: 147
 get_tx: 147, put_tx: 149

	Ingo

------------>
From: Ingo Molnar <mingo@elte.hu>
Subject: [patch] forcedeth: work around NULL skb dereference crash

work around a NULL skb dereference crash that occurs during high load.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 drivers/net/forcedeth.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

Index: linux/drivers/net/forcedeth.c
===================================================================
--- linux.orig/drivers/net/forcedeth.c
+++ linux/drivers/net/forcedeth.c
@@ -1902,6 +1902,11 @@ static void nv_tx_done(struct net_device
 						np->stats.tx_carrier_errors++;
 					np->stats.tx_errors++;
 				} else {
+					if (!np->get_tx_ctx->skb) {
+						printk("get_tx: %ld, put_tx: %ld\n", np->get_tx_ctx - np->first_tx_ctx, np->put_tx_ctx - np->first_tx_ctx);
+						WARN_ON(1);
+						break;
+					}
 					np->stats.tx_packets++;
 					np->stats.tx_bytes += np->get_tx_ctx->skb->len;
 				}
@@ -1917,6 +1922,11 @@ static void nv_tx_done(struct net_device
 						np->stats.tx_carrier_errors++;
 					np->stats.tx_errors++;
 				} else {
+					if (!np->get_tx_ctx->skb) {
+						printk("get_tx: %ld, put_tx: %ld\n", np->get_tx_ctx - np->first_tx_ctx, np->put_tx_ctx - np->first_tx_ctx);
+						WARN_ON(1);
+						break;
+					}
 					np->stats.tx_packets++;
 					np->stats.tx_bytes += np->get_tx_ctx->skb->len;
 				}
-

From: Ingo Molnar
Date: Monday, April 2, 2007 - 4:56 am

From: Ingo Molnar <mingo@elte.hu>
Subject: [patch] forcedeth.c: improve NAPI handler

another forcedeth.c thing: i noticed that its NAPI handler does not do 
tx-ring processing. The patch below implements this - tested on 
DESC_VER_2 hardware, with CONFIG_FORCEDETH_NAPI=y.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

Index: linux/drivers/net/forcedeth.c
===================================================================
--- linux.orig/drivers/net/forcedeth.c
+++ linux/drivers/net/forcedeth.c
@@ -3118,9 +3118,17 @@ static int nv_napi_poll(struct net_devic
 	int retcode;
 
 	if (np->desc_ver == DESC_VER_1 || np->desc_ver == DESC_VER_2) {
+		spin_lock_irqsave(&np->lock, flags);
+		nv_tx_done(dev);
+		spin_unlock_irqrestore(&np->lock, flags);
+
 		pkts = nv_rx_process(dev, limit);
 		retcode = nv_alloc_rx(dev);
 	} else {
+		spin_lock_irqsave(&np->lock, flags);
+		nv_tx_done_optimized(dev, np->tx_ring_size);
+		spin_unlock_irqrestore(&np->lock, flags);
+
 		pkts = nv_rx_process_optimized(dev, limit);
 		retcode = nv_alloc_rx_optimized(dev);
 	}
-

From: Ayaz Abdulla
Date: Monday, March 26, 2007 - 1:17 am

This issue might be resolved with the patch provided in the following 
bug report: http://bugzilla.kernel.org/show_bug.cgi?id=8058

Please try out the patch in the bug report without your patch and see if 
the issue reproduces.

Ayaz


-

From: Ingo Molnar
Date: Monday, March 26, 2007 - 2:04 am

trying to debug the forcedeth crash triggered another, new
v2.6.20 -> v2.6.21 regression:

maxcpus=1 on a dual-core system crashes the x86_64 SMP kernel in 
lock_policy_rwsem_write() - see the crash log below. Config attached.

i suspect it could be related to this recent commit:

 commit 5a01f2e8f3ac134e24144d74bb48a60236f7024d
 Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
 Date:   Mon Feb 5 16:12:44 2007 -0800

    [CPUFREQ] Rewrite lock in cpufreq to eliminate cpufreq/hotplug related issue

	Ingo

---------------->
Linux version 2.6.21-rc5 (mingo@dione) (gcc version 4.0.2) #14 SMP PREEMPT Mon Mar 26 10:51:51 CEST 2007
Command line: root=/dev/hda5 earlyprintk=serial,ttyS0,115200 console=ttyS0,115200 console=tty 3 profile=0 debug initcall_debug noapic apic=debug maxcpus=1 selinux=0 netconsole=4444@10.0.1.12/eth0,4444@10.0.1.14/00:16:76:ab:6e:84 nmi_watchdog=2 ignore_loglevel debug_dir
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
 BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
 BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS)
 BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data)
 BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
Entering add_active_range(0, 0, 159) 0 entries of 3200 used
Entering add_active_range(0, 256, 262128) 1 entries of 3200 used
end_pfn_map = 1048576
DMI 2.3 present.
ACPI: RSDP 000F76F0, 0014 (r0 Nvidia)
ACPI: RSDT 3FFF3040, 0034 (r1 Nvidia AWRDACPI 42302E31 AWRD        0)
ACPI: FACP 3FFF30C0, 0074 (r1 Nvidia AWRDACPI 42302E31 AWRD        0)
ACPI: DSDT 3FFF3180, 6264 (r1 NVIDIA AWRDACPI     1000 MSFT  100000E)
ACPI: FACS 3FFF0000, 0040
ACPI: SRAT 3FFF9500, 00A0 (r1 AMD    HAMMER          1 AMD         1)
ACPI: MCFG 3FFF9600, 003C (r1 Nvidia AWRDACPI 42302E31 AWRD        0)
ACPI: APIC ...
From: Venki Pallipadi
Date: Monday, March 26, 2007 - 11:12 am

Looks like some error with cpufreq add_dev and remove_dev got exposed  
by the above patch.

The problem here is:

cpufreq_register_driver() calls sysdev_driver_register()
which in turn calls add() for the CPU already present and ignores the  
return value from that add().
(drivers/base/sys.c:183).
Not if that add had failed, cpufreq will not know and later a remove 
() is being called on the same device, which causes the BUG()  
condition here.

However, I am not sure at this point why this gets triggered only on  
numcpus=1 case and not on normal boot case.

Will dig into this a bit more and hopefully have a patch to fix this  
soon.

Thanks,
~Venki

-

From: Venki Pallipadi
Date: Monday, March 26, 2007 - 12:03 pm

Ingo,

Does the patch below help?

Thanks,
Venki


Patch to resolve maxcpus=1 trigerring BUG() as reported by Ingo here

lkml subject:
"2.6.21-rc5: maxcpus=1 crash in cpufreq: kernel BUG at drivers/cpufreq/cpufreq.c:82!"

This check added to remove_dev  is symmetric to one in add_dev and handles
callbacks for offline cpus cleanly.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

Index: new/drivers/cpufreq/cpufreq.c
===================================================================
--- new.orig/drivers/cpufreq/cpufreq.c	2007-03-22 07:43:37.000000000 -0800
+++ new/drivers/cpufreq/cpufreq.c	2007-03-26 10:07:06.000000000 -0800
@@ -1015,6 +1015,10 @@
 {
 	unsigned int cpu = sys_dev->id;
 	int retval;
+
+	if (cpu_is_offline(cpu))
+		return 0;
+
 	if (unlikely(lock_policy_rwsem_write(cpu)))
 		BUG();
 
-

From: Ingo Molnar
Date: Tuesday, March 27, 2007 - 12:11 am

yes, it solves it, thanks!

Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo
-

From: Ingo Molnar
Date: Monday, March 26, 2007 - 3:11 am

hm, on a T60, after suspend/resume, i get an e1000 timeout:

e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
  Tx Queue             <0>
  TDH                  <ec>
  TDT                  <ec>
  next_to_use          <ec>
  next_to_clean        <82>
buffer_info[next_to_clean]
  time_stamp           <fffcc3db>
  next_to_watch        <82>
  jiffies              <fffd5da0>
  next_to_watch.status <1>

it works fine after that reset. The e1000 driver didnt do this before 
after resume the network was always available immediately. So this 
appears to be a relatively new regression (post-rc3 or so). high-res 
timers was disabled.

	Ingo
-

From: Kok, Auke
Date: Monday, March 26, 2007 - 8:39 am

THT == TDH -> this is a 'bogus' tx hang indicating that one or more parts
in the TX patch is not properly enabled.

Most likely, I suspect that we haven't enabled something because the ordering
of irq free/alloc was messed up and nobody cared before, but with all the
pci_save_state fixes going in we hit a bump.

The reset kicks it all back up in order so it's something silly like this for
sure.

The attached patch fixes that and sitting in my queue for a few days. Can you
see if that works?

Auke


---
e1000: Free interrupts symmetrically with resume

From: Auke Kok <auke-jan.h.kok@intel.com>

Free interrupts symmetrically with resume allocation to prevent
pci save/restore state from possibly failing or warning.

Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
---

 drivers/net/e1000/e1000_main.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 55ef148..93d41f0 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -5190,6 +5190,7 @@ e1000_suspend(struct pci_dev *pdev, pm_message_t state)
 	if (netif_running(netdev)) {
 		WARN_ON(test_bit(__E1000_RESETTING, &adapter->flags));
 		e1000_down(adapter);
+		e1000_free_irq(adapter);
 	}
 
 #ifdef CONFIG_PM
@@ -5257,9 +5258,6 @@ e1000_suspend(struct pci_dev *pdev, pm_message_t state)
 	if (adapter->hw.phy.type == e1000_phy_igp_3)
 		e1000_igp3_phy_powerdown_workaround_ich8lan(&adapter->hw);
 
-	if (netif_running(netdev))
-		e1000_free_irq(adapter);
-
 	/* Release control of h/w to f/w.  If f/w is AMT enabled, this
 	 * would have already happened in close and is redundant. */
 	e1000_release_hw_control(adapter);
-

From: Jesse Brandeburg
Date: Monday, March 26, 2007 - 8:50 am

was there a "NETDEV WATCHDOG" message that follows this?  If not it is
a harmless debug print.  Note the time_stamp and jiffies difference,
very large, consistent with a resume.  I think we need to disable the
internal e1000 tx hang code that causes this debug print when we are
suspending.  I'll work with auke to generate a short patch.
-

From: Ingo Molnar
Date: Monday, March 26, 2007 - 10:39 am

there was no "NETDEV WATCHDOG" message. But still there was a ~30 
seconds delay until i got the first few packets through the interface - 
while normally it's available almost instantly after resume. But ... 
this condition seems sporadic, i havent seen it on subsequent 
suspend+resume attempts.

	Ingo
-

From: Kok, Auke
Date: Monday, March 26, 2007 - 8:55 am

hmm, yeah, it appears that the patch I sent just a second ago isn't applicable 
in this case, since the irq handler is obviously enabled (the Link Up message 
proves that).

thanks to Jesse for being awake :)


Auke
-

From: Greg KH
Date: Monday, March 26, 2007 - 11:20 pm

I'd prefer to wait until 2.6.22 for this one, I've had too many odd
reports of problems in this area, and since no one has reported this
issue, it's not a real rush at all.

thanks,

greg k-h
-

From: Jesse Barnes
Date: Tuesday, March 27, 2007 - 9:49 am

Yeah, I don't think this one is critical.  These files aren't in heavy 
use yet, so fixing this in 2.6.22 should be ok.  I've only heard one 
complaint about this bug so far, and that was caused by some code still 
in development.

Jesse
-

From: Andi Kleen
Date: Tuesday, March 27, 2007 - 5:25 am

This is already fixed in a different way.

-Andi


-

From: Andrew Morton
Date: Tuesday, March 27, 2007 - 9:33 am

oh yeah.  Did that fix make it into 2.6.20.x?

I think we decided that make-aout-executables-work-again.patch might still
be a desirable thing to have, but I don't recall the reasoning for that.

Anyway, if it doesn't fix a bug it is nowhere near a high-priority patch
for that seething bugfest which we like to call a kernel, so I'll drop it.
-

From: Tilman Schmidt
Date: Wednesday, March 28, 2007 - 3:32 pm

[CC list trimmed]

It's not on that list, but would you mind slipping
drivers-isdn-gigaset-mark-some-static-data-as-const-v2.patch
into 2.6.21 too? It's largely trivial but I'd like to get it
out of the door.

Thanks,
Tilman

-- 
Tilman Schmidt                          E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeoeffnet mindestens haltbar bis: (siehe Rueckseite)
-

From: Dmitry Torokhov
Date: Tuesday, March 27, 2007 - 5:43 am

Slightly different fix is already in input tree. I'd perfer waiting
for 2.6.22 as this only affects touchpads when not using synaptics X
driver (which most distributions use by default).

-- 
Dmitry
-

From: Takashi Iwai
Date: Tuesday, March 27, 2007 - 2:49 am

At Mon, 26 Mar 2007 22:17:31 -0800,

The better fix is already in rc5, so please drop this one from your
tree.

    c26a8de23a4417f556250c4c099b048b26c430be
    [ALSA] ac97 - fix AD shared shared jack control logic


thanks,

Takashi
-

From: Adrian Bunk
Date: Monday, March 26, 2007 - 6:59 pm

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : crashes in KDE
References : http://bugzilla.kernel.org/show_bug.cgi?id=8157
Submitter  : Oliver Pinter <oliver.pntr@gmail.com>
Status     : unknown


Subject    : kwin dies silently  (sysctl related?)
References : http://lkml.org/lkml/2007/2/28/112
Submitter  : Sid Boyce <g3vbv@blueyonder.co.uk>
             Boris Mogwitz <boris@macbeth.rhoen.de>
Status     : submitter was asked to bisect further


Subject    : problem with sockets
References : http://lkml.org/lkml/2007/3/21/248
Submitter  : Jose Alberto Reguero <jareguero@telefonica.net>
Status     : unknown


Subject    : e1000 resume weirdness
References : http://lkml.org/lkml/2007/3/26/91
Submitter  : Ingo Molnar <mingo@elte.hu>
Handled-By : Jesse Brandeburg <jesse.brandeburg@gmail.com>
             Auke Kok <auke-jan.h.kok@intel.com>
Status     : problem is being debugged


Subject    : forcedeth: sporadic under-load crashes
References : http://lkml.org/lkml/2007/3/26/63
Submitter  : Ingo Molnar <mingo@elte.hu>
Handled-By : Ingo Molnar <mingo@elte.hu>
             Ayaz Abdulla <aabdulla@nvidia.com>
Status     : problem is being debugged


Subject    : forcedeth: skb_over_panic
References : http://bugzilla.kernel.org/show_bug.cgi?id=8058
Submitter  : Albert Hopkins <kernel@marduk.letterboxes.org>
Handled-By : Ayaz Abdulla <aabdulla@nvidia.com>
Patch      : http://bugzilla.kernel.org/show_bug.cgi?id=8058
Status     : patch available



-

From: Ingo Molnar
Date: Friday, March 30, 2007 - 5:04 am

i just found a new category of driver regressions in 2.6.21, doing 
allyesconfig bzImage bootup tests: the init methods of various drivers 
hangs in driver_unregister().

It is caused by this problem: the semantics of driver_unregister() [also 
implicitly called in pci_driver_unregister()] has apparently changed 
recently. If a driver does:

	pci_register_driver(&my_driver);
	...
	if (some_failure) {
		pci_unregister_driver(&my_driver);
		...
	}

it will hang the bootup in the following piece of code:

 drivers/base/driver.c:

  void driver_unregister(struct device_driver * drv)
  {
         bus_remove_driver(drv);
         wait_for_completion(&drv->unloaded);

the completion is never done - because nobody removes the bus while the 
init is still happening, obviously. (and bootup is serialized anyway)

now, the majority of drivers does the driver unregistry from its 
module-cleanup function, so it's not affected by this problem. But if 
you apply the debug patch attached further below, and do an allyesconfig 
bzImage bootup, there's 3 hits already:

 BUG: at drivers/base/driver.c:187 driver_unregister()
  [<c0105ff9>] show_trace_log_lvl+0x19/0x2e
  [<c01063e2>] show_trace+0x12/0x14
  [<c01063f8>] dump_stack+0x14/0x16
  [<c063f7e6>] driver_unregister+0x3d/0x43
  [<c0488048>] pci_unregister_driver+0x10/0x5f
  [<c1b5f7c7>] slgt_init+0x9b/0x1ca
  [<c1b31a2d>] init+0x15d/0x2bd
  [<c0105bc3>] kernel_thread_helper+0x7/0x10

 BUG: at drivers/base/driver.c:187 driver_unregister()
  [<c0105ff9>] show_trace_log_lvl+0x19/0x2e
  [<c01063e2>] show_trace+0x12/0x14
  [<c01063f8>] dump_stack+0x14/0x16
  [<c063f7e6>] driver_unregister+0x3d/0x43
  [<c0488048>] pci_unregister_driver+0x10/0x5f
  [<c0619505>] init_ipmi_si+0x70a/0x738
  [<c1b31a2d>] init+0x15d/0x2bd
  [<c0105bc3>] kernel_thread_helper+0x7/0x10

 BUG: at drivers/base/driver.c:187 driver_unregister()
  [<c0105ff9>] show_trace_log_lvl+0x19/0x2e
  [<c01063e2>] show_trace+0x12/0x14
  [<c01063f8>] ...

there's a new type of message in allyesconfig-bzImage bootup test:

Calling initcall 0xc1b6d692: fixed_init+0x0/0x33()
Fixed PHY: Registered new driver
Device 'fixed@100:1' does not have a release() function, it is broken and must be fixed.
BUG: at drivers/base/core.c:120 device_release()
 [<c0105ff9>] show_trace_log_lvl+0x19/0x2e
 [<c01063e2>] show_trace+0x12/0x14
 [<c01063f8>] dump_stack+0x14/0x16
 [<c063cddf>] device_release+0x7c/0x7e
 [<c0476c32>] kobject_cleanup+0x44/0x5e
 [<c0476c57>] kobject_release+0xb/0xd
 [<c04773ef>] kref_put+0x63/0x71
 [<c0476757>] kobject_put+0x14/0x16
 [<c063ceef>] put_device+0x11/0x13
 [<c063d943>] device_unregister+0x12/0x15
 [<c07337d1>] fixed_mdio_register_device+0x210/0x23b
 [<c1b6d6b0>] fixed_init+0x1e/0x33
 [<c1b31a2d>] init+0x15d/0x2bd
 [<c0105bc3>] kernel_thread_helper+0x7/0x10
 =======================
Device 'fixed@10:1' does not have a release() function, it is broken and must be fixed.
BUG: at drivers/base/core.c:120 device_release()
 [<c0105ff9>] show_trace_log_lvl+0x19/0x2e
 [<c01063e2>] show_trace+0x12/0x14
 [<c01063f8>] dump_stack+0x14/0x16
 [<c063cddf>] device_release+0x7c/0x7e
 [<c0476c32>] kobject_cleanup+0x44/0x5e
 [<c0476c57>] kobject_release+0xb/0xd
 [<c04773ef>] kref_put+0x63/0x71
 [<c0476757>] kobject_put+0x14/0x16
 [<c063ceef>] put_device+0x11/0x13
 [<c063d943>] device_unregister+0x12/0x15
 [<c07337d1>] fixed_mdio_register_device+0x210/0x23b
 [<c1b6d6c1>] fixed_init+0x2f/0x33
 [<c1b31a2d>] init+0x15d/0x2bd
 [<c0105bc3>] kernel_thread_helper+0x7/0x10
 =======================
Calling initcall 0xc1b6d6c5: sundance_init+0x0/0x16()
Calling initcall 0xc1b6d6db: hamachi_init+0x0/0x16()
-


That means that whatever driver has fixed_mdio_register_device() in it
is broken and needs to be fixed.

It is independant from your previous question about unregistering the
device from within module_init().

thanks,

greg k-h
-


(yes - hence the different subject line, etc.)

	Ingo
-


On Fri, 30 Mar 2007 16:25:14 +0200



-- 
Sincerely, 
Vitaly
-

From: Greg KH
Date: Friday, March 30, 2007 - 7:16 am

Yes, we should allow the ability to call unregister_driver from within
the module_init function.

But I don't understand what is causing you to see this problem.  Who is
holding the reference on the struct device at this point in time?  Is it
the fact that userspace has some files open and it hasn't released them
yet?

I don't see anything implicit in the driver_unregister() path that
should not work from within the module_init() path.  Kay, am I missing
anything here?

(patch left below for Kay's benefit)

thanks,

-

From: Ingo Molnar
Date: Friday, March 30, 2007 - 10:46 am

at least in the slgt_init() case the affected codepath is trivial:

        if ((rc = pci_register_driver(&pci_driver)) < 0) {
                printk("%s pci_register_driver error=%d\n", driver_name, rc);
                return rc;
        }
        pci_registered = 1;

        if (!slgt_device_list) {
                printk("%s no devices found\n",driver_name);
                pci_unregister_driver(&pci_driver);
                return -ENODEV;

slgt_device_list is NULL because no matching PCI ID is on my system (i 
dont have this hardware), so the ->probe() function did not get called 
at all.

i.e. a pure pci_register_driver() + pci_unregister_driver() sequence 
seems to cause a hang. I.e. it seems to be a pure driver-base-core 
matter.

	Ingo
-

From: Greg KH
Date: Friday, March 30, 2007 - 12:32 pm

Sorry, no, I realize how this could happen in the driver, I just don't
see what in the driver core would be keeping this driver from having
it's release function called at the unregister() time.

Something has grabbed a reference to the driver...

Oh wait, is this code a module or built into the kernel?

If it's built in, there's still a reference counting bug in the
module/driver hookup logic as we really don't have a "module" yet we are
still thinking we do as we represent it in /sys/module and create the
linkages.

I created some horrible patches to try to track this down, as it was
reported on lkml (look for "Subject: kref refcounting breakage in mainline" )
but never got it working correctly.

I bet if you build that code as a module, it will work just fine, can
you try it?

Kay, did you ever get a chance to look into this reference counting
issue?

thanks,

greg k-h
-

From: Kay Sievers
Date: Friday, March 30, 2007 - 7:32 pm

Does the attached work for you?

Thanks,
Kay
From: Ingo Molnar
Date: Saturday, March 31, 2007 - 9:51 am

yeah, this fixed the hangs!

please push it to Andrew and Linus, we want this in v2.6.21. See the 
full patch below, with proper headers, etc.

	Ingo

--------------------->
From: Kay Sievers <kay.sievers@vrfy.org>
Subject: [patch] driver core: fix built-in drivers sysfs links

built-in drivers had broken sysfs links that caused bootup hangs for 
certain driver unregistry sequences.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/device.h |    1 +
 kernel/module.c        |   18 ++++++++++++++----
 2 files changed, 15 insertions(+), 4 deletions(-)

Index: linux/include/linux/device.h
===================================================================
--- linux.orig/include/linux/device.h
+++ linux/include/linux/device.h
@@ -128,6 +128,7 @@ struct device_driver {
 
 	struct module		* owner;
 	const char 		* mod_name;	/* used for built-in modules */
+	struct module_kobject	* mkobj;
 
 	int	(*probe)	(struct device * dev);
 	int	(*remove)	(struct device * dev);
Index: linux/kernel/module.c
===================================================================
--- linux.orig/kernel/module.c
+++ linux/kernel/module.c
@@ -2384,8 +2384,13 @@ void module_add_driver(struct module *mo
 
 		/* Lookup built-in module entry in /sys/modules */
 		mkobj = kset_find_obj(&module_subsys.kset, drv->mod_name);
-		if (mkobj)
+		if (mkobj) {
 			mk = container_of(mkobj, struct module_kobject, kobj);
+			/* remember our module structure */
+			drv->mkobj = mk;
+			/* kset_find_obj took a reference */
+			kobject_put(mkobj);
+		}
 	}
 
 	if (!mk)
@@ -2405,17 +2410,22 @@ EXPORT_SYMBOL(module_add_driver);
 
 void module_remove_driver(struct device_driver *drv)
 {
+	struct module_kobject *mk = NULL;
 	char *driver_name;
 
 	if (!drv)
 		return;
 
 	sysfs_remove_link(&drv->kobj, "module");
-	if (drv->owner && drv->owner->mkobj.drivers_dir) {
+
+	if (drv->owner)
+		mk = &drv->owner->mkobj;
+	else if (drv->mkobj)
+		mk = drv->mkobj;
+	if (mk && ...
From: Ingo Molnar
Date: Saturday, March 31, 2007 - 9:31 am

i'll try Kay's patch.

	Ingo
-

From: Linus Torvalds
Date: Sunday, April 1, 2007 - 10:17 am

I think the whole "wait_for_completion()" is just broken. 

We asked to *unregister* the driver, not to wait for users.

I would suggest that for 2.6.21, the minimal fix is actually something 
like the appended. Comments? Ingo, does this fix things for you?

In general, I think the whole "wait for locks" or "wait for users" is 
almost always a sign of a much bigger bug in reference counting. Modules 
are special, though, since module code/data doesn't really get reference 
counted. But doing it for built-in stuff when you don't need to really 
just sounds *wrong*.

		Linus

---
 drivers/base/driver.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/drivers/base/driver.c b/drivers/base/driver.c
index 1214cbd..082bfde 100644
--- a/drivers/base/driver.c
+++ b/drivers/base/driver.c
@@ -183,7 +183,14 @@ int driver_register(struct device_driver * drv)
 void driver_unregister(struct device_driver * drv)
 {
 	bus_remove_driver(drv);
-	wait_for_completion(&drv->unloaded);
+	/*
+	 * If the driver is a module, we are probably in
+	 * the module unload path, and we want to wait
+	 * for everything to unload before we can actually
+	 * finish the unload.
+	 */
+	if (drv->owner)
+		wait_for_completion(&drv->unloaded);
 }
 
 /**
-

From: Ingo Molnar
Date: Sunday, April 1, 2007 - 10:35 am

yeah - it does the trick: i just booted the .config in question and your 
patch works fine and the bootup does not hang in slgt_init() anymore:

 Calling initcall 0xc1e78d86: slgt_init+0x0/0x1ee()
 SyncLink GT $Revision: 4.36 $
 SyncLink GT no devices found
 initcall 0xc1e78d86: slgt_init+0x0/0x1ee() returned -19
 Calling initcall 0xc1e78f74: n_hdlc_init+0x0/0x9c()
 HDLC line discipline: version $Revision: 4.8 $, maxframe=4096
 N_HDLC line discipline registered.
 initcall 0xc1e78f74: n_hdlc_init+0x0/0x9c() returned 0

thanks! Find below the full patch with metadata filled in (no other 
changes).

	Ingo

------------------------->
Subject: [patch] driver core: if built-in, do not wait in driver_unregister()
From: Linus Torvalds <torvalds@linux-foundation.org>

built-in drivers suffered bootup hangs with certain driver unregistry
sequences, due to sysfs breakage.

do the minimal fix for v2.6.21: only wait if the driver is a module.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 drivers/base/driver.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Index: linux/drivers/base/driver.c
===================================================================
--- linux.orig/drivers/base/driver.c
+++ linux/drivers/base/driver.c
@@ -183,7 +183,14 @@ int driver_register(struct device_driver
 void driver_unregister(struct device_driver * drv)
 {
 	bus_remove_driver(drv);
-	wait_for_completion(&drv->unloaded);
+	/*
+	 * If the driver is a module, we are probably in
+	 * the module unload path, and we want to wait
+	 * for everything to unload before we can actually
+	 * finish the unload.
+	 */
+	if (drv->owner)
+		wait_for_completion(&drv->unloaded);
 }
 
 /**
-

From: Greg KH
Date: Sunday, April 1, 2007 - 6:47 pm

No, I think this will catch the "hang" but look in sysfs in
/sys/modules/ for the module directory for the module that "failed" to
be loaded.  I think you will see some dangling files there that are
incorrect, and might oops if you cat from them (don't remember).

Kay's patch is correct and fixes the reference count issue properly,
this one just papers over it by ignoring the fact that the driver is
never released and cleaned up in memory.

(patch included below so Kay can verify this...)

thanks,

-

From: Kok, Auke
Date: Wednesday, March 28, 2007 - 11:54 am

The issue comes from a corner case and the underlying problem is that e1000 
isn't stopping tx properly. We have a fix for this pending in our tree that I'll 
push upstream for 2.6.22 to Jeff, but I don't think this should be a blocker and 
it's probably is not a regression at all, the gap has always been present.

on a side note, this is probably fixed easily by turning the adapters 
detect_tx_hung flag off in e1000_down, so if someone spots this reoccurring 
somewhat regularly, please contact me so we can debug it. I myself have a system 
suspend/resuming in circles for an hour now with traffic flying across without a 
single hit on it....

Adrian, you probably want to drop this issue from your list.

Cheers,


Auke
-

From: Ingo Molnar
Date: Wednesday, March 28, 2007 - 12:23 pm

agreed - i have done many suspend/resumes meanwhile, and this condition 
has not reoccured since then. (and even when it occured, it was 
transitionary)

	Ingo
-

From: Adrian Bunk
Date: Friday, March 30, 2007 - 11:04 am

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Adrian Bunk
Date: Monday, March 26, 2007 - 6:59 pm

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : x86_64 SMP kernel: maxcpus=1 crash in cpufreq
References : http://lkml.org/lkml/2007/3/26/54
Submitter  : Ingo Molnar <mingo@elte.hu>
Handled-By : Venki Pallipadi <venkatesh.pallipadi@intel.com>
Status     : problem is being debugged


Subject    : kernels fail to boot with drives on ATIIXP controller
             (ACPI/IRQ related)
References : https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229621
             http://lkml.org/lkml/2007/3/4/257
Submitter  : Michal Jaegermann <michal@ellpspace.math.ualberta.ca>
Status     : unknown


Subject    : NCQ problem with ahci and Hitachi drive  (ACPI related)
References : http://lkml.org/lkml/2007/3/4/178
             http://lkml.org/lkml/2007/3/9/475
             http://lkml.org/lkml/2007/2/22/8
Submitter  : Mathieu Bérard <Mathieu.Berard@crans.org>
Handled-By : Tejun Heo <htejun@gmail.com>
Patch      : http://lkml.org/lkml/2007/2/22/8
Status     : possible patch available


Subject    : libata: PATA UDMA/100 configured as UDMA/33
References : http://lkml.org/lkml/2007/2/20/294
             http://www.mail-archive.com/linux-ide@vger.kernel.org/msg04115.html
             http://bugzilla.kernel.org/show_bug.cgi?id=8133
             http://bugzilla.kernel.org/show_bug.cgi?id=8164
             http://lkml.org/lkml/2007/3/21/330
Submitter  : Fabio Comolli <fabio.comolli@gmail.com>
             Plamen Petrov <plamen.petrov@tk.ru.acad.bg>
             Laurent Riffard <laurent.riffard@free.fr>
             Lukas Hejtmanek <xhejtman@mail.muni.cz>
Handled-By : Tejun Heo <htejun@gmail.com>
Patch      : ...
From: Laurent Riffard
Date: Wednesday, March 28, 2007 - 12:46 pm

pata-via case is fixed for me in 2.6.21-rc5-mm2 (was already fixed in 2.6.21-rc4-mm1).

thanks
~~
laurent

-

From: Fabio Comolli
Date: Thursday, March 29, 2007 - 12:02 pm

Fixed for me (ata_piix) with today's GIT (Tejun's patch got applied).
Regards,
Fabio
-

From: Adrian Bunk
Date: Monday, March 26, 2007 - 6:59 pm

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : Oops when changing DVB-T adapter
References : http://lkml.org/lkml/2007/3/9/212
Submitter  : CIJOML <cijoml@volny.cz>
Status     : unknown


Subject    : snd_intel8x0: divide error: 0000
References : http://lkml.org/lkml/2007/3/5/252
Submitter  : Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Status     : unknown


Subject    : USB: iPod doesn't work  (CONFIG_USB_SUSPEND)
References : http://lkml.org/lkml/2007/3/21/320
Submitter  : Tino Keitel <tino.keitel@gmx.de>
Caused-By  : Marcelo Tosatti <marcelo@kvack.org>
             commit 1d619f128ba911cd3e6d6ad3475f146eb92f5c27
Handled-By : Oliver Neukum <oneukum@suse.de>
Status     : problem is being debuggged


-

From: Adrian Bunk
Date: Monday, March 26, 2007 - 6:59 pm

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : ThinkPad X60: resume no longer works  (PCI related?)
References : http://lkml.org/lkml/2007/3/13/3
Submitter  : Dave Jones <davej@redhat.com>
             Jeremy Fitzhardinge <jeremy@goop.org>
Caused-By  : PCI merge
             commit 78149df6d565c36675463352d0bfe0000b02b7a7
Handled-By : Eric W. Biederman <ebiederm@xmission.com>
             Rafael J. Wysocki <rjw@sisk.pl>
Status     : problem is being debugged


Subject    : second suspend to disk in a row results in an oops  (MSI)
References : http://lkml.org/lkml/2007/3/17/43
             http://lkml.org/lkml/2007/3/22/150
             http://lkml.org/lkml/2007/3/26/205
             http://lkml.org/lkml/2007/3/26/76
Submitter  : Thomas Meyer <thomas@m3y3r.de>
             Frédéric Riss <frederic.riss@gmail.com>
             Marcus Better <marcus@better.se>
Handled-By : Eric W. Biederman <ebiederm@xmission.com>
Patch      : http://lkml.org/lkml/2007/3/24/136
Status     : patch was suggested


Subject    : Suspend to RAM doesn't work anymore  (ACPI?)
References : http://lkml.org/lkml/2007/3/19/128
             http://bugzilla.kernel.org/show_bug.cgi?id=8247
Submitter  : Tobias Doerffel <tobias.doerffel@gmail.com>
Handled-By : Rafael J. Wysocki <rjw@sisk.pl>
             Len Brown <len.brown@intel.com>
Status     : problem is being debugged


Subject    : s2ram autowake regression  (ACPI?)
References : http://lkml.org/lkml/2007/3/20/96
Submitter  : Pavel Machek <pavel@ucw.cz>
Handled-By : Len Brown <lenb@kernel.org>
Status     : submitter was asked to test a patch


Subject    : SATA breakage ...
From: Marcus Better
Date: Tuesday, March 27, 2007 - 1:00 am

=46or the sake of completeness, my bisection resulted in this:

392ee1e6dd901db6c4504617476f6442ed91f72d is first bad commit
commit 392ee1e6dd901db6c4504617476f6442ed91f72d
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Thu Mar 8 13:04:57 2007 -0700

    [PATCH] msi: Safer state caching.

Marcus
From: Eric W. Biederman
Date: Tuesday, March 27, 2007 - 6:25 am

Right.  However if this is what Thomas was seeing the problem turned
out to be an issue with pci_enable_device changing the irq number.
It just happens that now the code cares, so the bug is found.

Marcus any chance I could see an oops?  Or you could try the patch I
previously posted when debugging this with Thomas.

I'm going to clean that patch up and send it along in hopes that it
helps anyway and see where we land.

Eric
-

From: Marcus Better
Date: Tuesday, March 27, 2007 - 9:53 am

I didn't see anything, it froze with the yellow "Linux!" sign. Any idea how=
 to=20

Do you mean the one referenced above? I tried it [1] and it works.

Marcus

[1] http://permalink.gmane.org/gmane.linux.kernel/509299
From: Eric W. Biederman
Date: Tuesday, March 27, 2007 - 1:50 pm

Yes.  Sorry for being redundant. Having the bisect results after the
confirmation that the patch worked threw me.

Eric
-

From: Rafael J. Wysocki
Date: Tuesday, March 27, 2007 - 3:09 am

are related to the same issue.

The problem is that we call disable_nonboot_cpus() in swsusp before
powering down the system in order to avoid triggering the WARN_ON()
in arch/x86_64/kernel/acpi/sleep.c:init_low_mapping() and this doesn't
work well on Thomas' system.

Since the problem has been introduced by commit
94985134b7b46848267ed6b734320db01c974e72
(swsusp: disable nonboot CPUs before entering platform suspend), I think it's
better to revert this commit and remove the the WARN_ON() in
arch/x86_64/kernel/acpi/sleep.c:init_low_mapping() (appended is a patch that
removes the WARN_ON()).

Greetings,
Rafael


---
Remove the WARN_ON() in arch/x86_64/kernel/acpi/sleep.c:init_low_mapping(),
which triggers every time during the suspend to disk in the platform mode, as
the potential problem it is related to doesn't seem to occur in practice.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86_64/kernel/acpi/sleep.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6.21-rc5/arch/x86_64/kernel/acpi/sleep.c
===================================================================
--- linux-2.6.21-rc5.orig/arch/x86_64/kernel/acpi/sleep.c
+++ linux-2.6.21-rc5/arch/x86_64/kernel/acpi/sleep.c
@@ -66,8 +66,10 @@ static void init_low_mapping(void)
 {
 	pgd_t *slot0 = pgd_offset(current->mm, 0UL);
 	low_ptr = *slot0;
+	/* FIXME: We're playing with the current task's page tables here, which
+	 * is potentially dangerous on SMP systems.
+	 */
 	set_pgd(slot0, *pgd_offset(current->mm, PAGE_OFFSET));
-	WARN_ON(num_online_cpus() != 1);
 	local_flush_tlb();
 }
 
-

From: Adrian Bunk
Date: Tuesday, March 27, 2007 - 3:29 pm

It's now in Linus' tree.

Thomas (Meyer), are there any regressions left with the latest -git tree 

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Thomas Meyer
Date: Tuesday, March 27, 2007 - 3:45 pm

No, not for me.
-

From: Ingo Molnar
Date: Wednesday, March 28, 2007 - 5:19 am

i can reproduce a crash on the second suspend-to-ram, on a T60. I get a 
crash here:

 #ifdef CONFIG_PM
 static void __pci_restore_msi_state(struct pci_dev *dev)
 {
         int pos;
         u16 control;
         struct msi_desc *entry;

         if (!dev->msi_enabled)
                 return;

         entry = get_irq_msi(dev->irq);
         pos = entry->msi_attrib.pos; <-------- crash on NULL dereference


i.e. 'entry' is NULL after get_irq_msi(). (i can see the crash only on 
the VGA screen so no dump of it available. Can write down more info if  
it's helpful.)

I have tried Eric's patch above but now i always get a hang after 
"system 00:00: resuming", already upon the first suspend-resume. Not 
even the NMI watchdog can get the system out of that hang.

	Ingo
-

From: Ingo Molnar
Date: Wednesday, March 28, 2007 - 5:41 am

find below the PM log of a successful suspend/resume cycle. (I've marked 
the place that hangs with '[hard hang]')

	Ingo

---------------->
PM: Preparing system for mem sleep
Stopping tasks ... done.
psmouse serio2: suspend
psmouse serio1: suspend
atkbd serio0: suspend
i8042 i8042: suspend
sd 0:0:0:0: suspend
ide 0.0: suspend
serial8250 serial8250: suspend
platform vesafb.0: suspend
pci_express 0000:00:1c.3:pcie03: suspend
pci_express 0000:00:1c.3:pcie02: suspend
pci_express 0000:00:1c.3:pcie00: suspend
pci_express 0000:00:1c.2:pcie03: suspend
pci_express 0000:00:1c.2:pcie02: suspend
pci_express 0000:00:1c.2:pcie00: suspend
pci_express 0000:00:1c.1:pcie03: suspend
pci_express 0000:00:1c.1:pcie02: suspend
pci_express 0000:00:1c.1:pcie00: suspend
pci_express 0000:00:1c.0:pcie03: suspend
pci_express 0000:00:1c.0:pcie02: suspend
pci_express 0000:00:1c.0:pcie00: suspend
platform pcspkr: suspend
pnp 00:0a: suspend
i8042 aux 00:09: suspend
i8042 kbd 00:08: suspend
pnp 00:07: suspend
pnp 00:06: suspend
pnp 00:05: suspend
pnp 00:04: suspend
pnp 00:03: suspend
system 00:02: suspend
pnp 00:01: suspend
system 00:00: suspend
yenta_cardbus 0000:15:00.0: suspend
pci 0000:03:00.0: suspend
e1000 0000:02:00.0: suspend
pci 0000:03:00.0: resuming
yenta_cardbus 0000:15:00.0: resuming
PM: Writing back config space on device 0000:15:00.0 at offset f (was 34001ff, writing 5c0010b)
PM: Writing back config space on device 0000:15:00.0 at offset e (was 0, writing 94fc)
PM: Writing back config space on device 0000:15:00.0 at offset d (was 0, writing 9400)
PM: Writing back config space on device 0000:15:00.0 at offset c (was 0, writing 90fc)
PM: Writing back config space on device 0000:15:00.0 at offset b (was 0, writing 9000)
PM: Writing back config space on device 0000:15:00.0 at offset a (was 0, writing 8bfff000)
PM: Writing back config space on device 0000:15:00.0 at offset 9 (was 0, writing 88000000)
PM: Writing back config space on device 0000:15:00.0 at offset 8 (was 0, ...
From: Ingo Molnar
Date: Wednesday, March 28, 2007 - 6:03 am

ok, this was a red herring: the hard hang was an effect of netconsole 
combined with CONFIG_DISABLE_CONSOLE_SUSPEND. Disabling netconsole 
solved it. I'll now re-test Eric's MSI patch.

	Ingo
-

From: Ingo Molnar
Date: Wednesday, March 28, 2007 - 6:06 am

Eric's patch seems to have done the trick on my T60: i've done 10 
suspend+resumes and each worked fine. I've tidied up the description 
part of Eric's patch a bit for upstream application - find it below.

	Ingo

---------------------->
Subject: [patch] MSI-X: fix resume crash
From: Eric W. Biederman <ebiederm@xmission.com>

I think the right solution is to simply make pci_enable_device just flip 
enable bits and move the rest of the work someplace else.

However a thorough cleanup is a little extreme for this point in the 
release cycle, so I think a quick hack that makes the code not stomp the 
irq when msi irq's are enabled should be the first fix.  Then we can 
later make the code not change the irqs at all.

Tony, Len the way pci_disable_device is being used in a suspend/resume 
path by a few drivers is completely incompatible with the way irqs are 
allocated on ia64.  In particular people the following sequence occurs 
in several drivers.

probe:
  pci_enable_device(pdev);
  request_irq(pdev->irq);
suspend:
  pci_disable_device(pdev);
resume:
  pci_enable_device(pdev);
remove:
  free_irq(pdev->irq);
  pci_disable_device(pdev);

What I'm proposing we do is move the irq allocation code out of 
pci_enable_device and the irq freeing code out of pci_disable_device in 
the future.  If we move ia64 to a model where the irq number equal the 
gsi like we have for x86_64 and are in the middle of for i386 that 
should be pretty straight forward.  It would even be relatively simple 
to delay vector allocation in that context until request_irq, if we 
needed the delayed allocation benefit.  Do you two have any problems 
with moving in that direction?

If fixing the arch code is unacceptable for some reason I'm not aware of 
we need to audit the 10-20 drivers that call pci_disable_device in their 
suspend/resume processing and ensure that they have freed all of the 
irqs before that point.  Given that I have bug reports on the msi path I 
know that isn't ...
From: Len Brown
Date: Wednesday, March 28, 2007 - 9:30 pm

There are no IA64 machines that support system suspend/resume today --
so you have 0 chance of breaking the IA64 suspend/resume installed base.

My understanding is that Luming Yu has cobbled IA64 S4 support

I think consistency here would be _wonderful_.
Of course the beauty of having identity GSI=IRQ and a /proc/interrupts
that tells you what IOAPIC pin you are using become moot with MSI --
but hey, showing the IRQ number rather than the vector number

I think the suspend/resume interrupt logic needs some serious attention.
We've had several schemes for suspend/resume of interrupts, several
changes in strategy, and right now I think we are inconsistent,
and frankly, I'm amazed it works at all.

-

From: Eric W. Biederman
Date: Wednesday, March 28, 2007 - 9:57 pm

Yes.  It also allows for bigger machines.  And I can get a consistent
number out of MSI if we allocate irq numbers in a sufficiently non-sparse
way.  Something like bus|device|func|irq which is 8+5+3+12 or 28 bits...

What I have been doing lately is to aim at consistency in how a function
is called (and thus how it is expected to be used) and how it is actually
implemented.  When I have a choice I try to pick a forgiving implementation
so that driver writers don't have to follow a magic correct path for
things to work correctly.  

Removing the irq assignment from pci_enable_device is something that
matches implementation with use.

As for the rest it seems reasonable to me to allow an irq to be held
requested over suspend/resume and to save and restore apic and msi
capability state.  Especially since irq numbers are a kernel
abstraction we should be able to do with them what we need to.

Honestly the whole suspend/resume thing is beyond me at this point I'm
laptop free...  But I do know how to make code consistent with itself.

Eric
-

From: Eric W. Biederman
Date: Wednesday, March 28, 2007 - 6:31 am

Thanks.  Tidying up the description has been on my todo list for the
last little bit but I just haven't gotten there.

I've gotten at least Tony's sign off on the architectural direction
so there is nothing to prevent this patch from going in.  Unless
Linus or someone wants a more thorough patch this late in the
release cycle.

-

From: Ingo Molnar
Date: Wednesday, March 28, 2007 - 6:36 am

i've updated the patch below with your sign-off, and trimmed the 
description to the relevant bits, for easy upstream application.

	Ingo

------------->
Subject: [patch] MSI-X: fix resume crash
From: Eric W. Biederman <ebiederm@xmission.com>

So I think the right solution is to simply make pci_enable_device just 
flip enable bits and move the rest of the work someplace else.

However a thorough cleanup is a little extreme for this point in the 
release cycle, so I think a quick hack that makes the code not stomp the 
irq when msi irq's are enabled should be the first fix.  Then we can 
later make the code not change the irqs at all.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/cris/arch-v32/drivers/pci/bios.c |    4 +++-
 arch/frv/mb93090-mb00/pci-vdk.c       |    3 ++-
 arch/i386/pci/common.c                |    6 ++++--
 arch/ia64/pci/pci.c                   |    8 ++++++--
 4 files changed, 15 insertions(+), 6 deletions(-)

Index: linux/arch/cris/arch-v32/drivers/pci/bios.c
===================================================================
--- linux.orig/arch/cris/arch-v32/drivers/pci/bios.c
+++ linux/arch/cris/arch-v32/drivers/pci/bios.c
@@ -100,7 +100,9 @@ int pcibios_enable_device(struct pci_dev
 	if ((err = pcibios_enable_resources(dev, mask)) < 0)
 		return err;
 
-	return pcibios_enable_irq(dev);
+	if (!dev->msi_enabled)
+		pcibios_enable_irq(dev);
+	return 0;
 }
 
 int pcibios_assign_resources(void)
Index: linux/arch/frv/mb93090-mb00/pci-vdk.c
===================================================================
--- linux.orig/arch/frv/mb93090-mb00/pci-vdk.c
+++ linux/arch/frv/mb93090-mb00/pci-vdk.c
@@ -466,6 +466,7 @@ int pcibios_enable_device(struct pci_dev
 
 	if ((err = pcibios_enable_resources(dev, mask)) < 0)
 		return err;
-	pcibios_enable_irq(dev);
+	if (!dev->msi_enabled)
+		pcibios_enable_irq(dev);
 	return 0;
 }
Index: ...
From: Adrian Bunk
Date: Monday, March 26, 2007 - 6:59 pm

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : MacMini doesn't come out of suspend to ram  (i386 clockevents)
             (CONFIG_HPET_TIMER)
References : http://lkml.org/lkml/2007/3/21/374
Submitter  : Frédéric Riss <frederic.riss@gmail.com>
             Tino Keitel <tino.keitel@gmx.de>
Caused-By  : Thomas Gleixner <tglx@linutronix.de>
             commit e9e2cdb412412326c4827fc78ba27f410d837e6e
Status     : unknown


Subject    : MacBook Core Duo: suspend to disk hangs
References : http://bugzilla.kernel.org/show_bug.cgi?id=8224
Submitter  : Mike Harris <atarimike@wavecable.com>
Status     : unknown


Subject    : Dynticks and High resolution Timer hang boot during IDE detection
             workaround: clocksource=acpi_pm
References : http://lkml.org/lkml/2007/3/7/504
Submitter  : Stephane Casset <sept@logidee.com>
Caused-By  : John Stultz <johnstul@us.ibm.com>
             commit 6bb74df481223731af6c7e0ff3adb31f6442cfcd
Handled-By : John Stultz <johnstul@us.ibm.com>
Patch      : http://lkml.org/lkml/2007/3/22/287
Status     : workaround-patch available


Subject    : after resume: X hangs after drawing a couple of windows
             workaround: clocksource=acpi_pm
References : http://lkml.org/lkml/2007/3/8/117
             http://lkml.org/lkml/2007/3/25/20
             http://lkml.org/lkml/2007/3/26/151
Submitter  : Michael S. Tsirkin <mst@mellanox.co.il>
Handled-By : Thomas Gleixner <tglx@linutronix.de>
Status     : problem is being debugged


Subject    : first disk access after resume takes several minutes
             resume from RAM broken
             'date' does not advance ...
From: Adrian Bunk
Date: Friday, March 30, 2007 - 2:32 pm

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : crashes in KDE
References : http://bugzilla.kernel.org/show_bug.cgi?id=8157
Submitter  : Oliver Pinter <oliver.pntr@gmail.com>
Status     : unknown


Subject    : hung bootup in various drivers
References : http://lkml.org/lkml/2007/3/30/68
Submitter  : Ingo Molnar <mingo@elte.hu>
Handled-By : Ingo Molnar <mingo@elte.hu>
             Greg KH <gregkh@suse.de>
             Kay Sievers <kay.sievers@vrfy.org>
Status     : problem is being discussed


Subject    : kernels fail to boot with drives on ATIIXP controller
             (ACPI/IRQ related)
References : https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229621
             http://lkml.org/lkml/2007/3/4/257
Submitter  : Michal Jaegermann <michal@ellpspace.math.ualberta.ca>
Status     : unknown


Subject    : NCQ problem with ahci and Hitachi drive
References : http://lkml.org/lkml/2007/3/4/178
             http://lkml.org/lkml/2007/3/9/475
             http://lkml.org/lkml/2007/2/22/8
Submitter  : Mathieu Bérard <Mathieu.Berard@crans.org>
Handled-By : Tejun Heo <htejun@gmail.com>
Patch      : http://lkml.org/lkml/2007/2/22/8
Status     : possible patch available



-

From: Greg KH
Date: Friday, March 30, 2007 - 2:38 pm

Note, this should probably read:
	hung bootup for drivers built into the kernel, that fail their
	module_init() call.

A much smaller minority of cases :)

thanks,

greg k-h
-

From: Michal Jaegermann
Date: Friday, March 30, 2007 - 5:23 pm

I have now even better one with pata_via.  A kernel, which for
all practical purposes is 2.6.21-rc5, not only refuses to boot
(and I cannot find some option combination which would allow me to
do so anyway) but simply refuses to read _any_ data from a media.
This included a partitioning information.

Earlier kernel on the same hardware boots without raising any fuss.

Details are collected as
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=234650

   Michal
-

From: Adrian Bunk
Date: Saturday, March 31, 2007 - 8:01 am

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Michal Jaegermann
Date: Saturday, March 31, 2007 - 9:42 am

You mean that a quoted report talks about 2.6.20-1.3025.fc7 kernel?
These are vagaries of kernel version numbering in Fedora.
Changelogs are not that clear but it appears that
2.6.19-1.2911.6.4.fc6 will be actually closer to 2.6.20.
That kernel from a bug report is really, for all intents and purposes,
2.6.21-rc5 (if I am not misreading something).

I am afraid that I do not have at this moment an easy to way to check
"plain" 2.6.20 on the hardware in question.  It appears that the
essential difference is that a working kernel is using and old IDE
driver, and sees the drive - in this case - as /dev/hdc, while the
current one tries to go through libata and chockes uncontrollably.

   Michal
-

From: Adrian Bunk
Date: Friday, March 30, 2007 - 2:32 pm

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : snd_hda_intel doesn't work with ASUS M2V mainboard
References : http://bugzilla.kernel.org/show_bug.cgi?id=8273
Submitter  : Hans-Georg Rist <hg.rist@web.de>
Status     : unknown


Subject    : snd_intel8x0: divide error: 0000
References : http://lkml.org/lkml/2007/3/5/252
Submitter  : Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Status     : unknown


Subject    : hal daemon crashes after pulling a USB serial device
References : http://www.opensubscriber.com/message/linux-usb-devel@lists.sourceforge.net/6369800.html
Submitter  : Andi Kleen <ak@suse.de>
Handled-By : Oliver Neukum <oneukum@suse.de>
Status     : problem is being debugged


Subject    : USB: iPod doesn't work  (CONFIG_USB_SUSPEND)
References : http://lkml.org/lkml/2007/3/21/320
Submitter  : Tino Keitel <tino.keitel@gmx.de>
Caused-By  : Marcelo Tosatti <marcelo@kvack.org>
             commit 1d619f128ba911cd3e6d6ad3475f146eb92f5c27
Handled-By : Oliver Neukum <oneukum@suse.de>
Status     : problem is being debuggged


Subject    : USB: Oops when changing DVB-T adapter
References : http://lkml.org/lkml/2007/3/9/212
Submitter  : CIJOML <cijoml@volny.cz>
Status     : unknown


Subject    : forcedeth: sporadic under-load crashes
References : http://lkml.org/lkml/2007/3/26/63
Submitter  : Ingo Molnar <mingo@elte.hu>
Handled-By : Ingo Molnar <mingo@elte.hu>
             Ayaz Abdulla <aabdulla@nvidia.com>
Status     : problem is being debugged


-

From: Adrian Bunk
Date: Friday, March 30, 2007 - 2:32 pm

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : ThinkPad X60: resume no longer works  (PCI related?)
References : http://lkml.org/lkml/2007/3/13/3
Submitter  : Dave Jones <davej@redhat.com>
             Jeremy Fitzhardinge <jeremy@goop.org>
Caused-By  : PCI merge
             commit 78149df6d565c36675463352d0bfe0000b02b7a7
Handled-By : Eric W. Biederman <ebiederm@xmission.com>
             Rafael J. Wysocki <rjw@sisk.pl>
Status     : problem is being debugged


Subject    : Suspend to RAM doesn't work anymore  (ACPI?)
References : http://lkml.org/lkml/2007/3/19/128
             http://bugzilla.kernel.org/show_bug.cgi?id=8247
Submitter  : Tobias Doerffel <tobias.doerffel@gmail.com>
Handled-By : Rafael J. Wysocki <rjw@sisk.pl>
             Len Brown <len.brown@intel.com>
Status     : problem is being debugged


Subject    : SATA breakage on resume
References : http://lkml.org/lkml/2007/3/7/233
Submitter  : Thomas Gleixner <tglx@linutronix.de>
             Soeren Sonnenburg <kernel@nn7.de>
Status     : unknown


Subject    : resume from RAM corrupts vesafb console
References : http://lkml.org/lkml/2007/3/26/76
Submitter  : Marcus Better <marcus@better.se>
Handled-By : Pavel Machek <pavel@ucw.cz>
Status     : problem is being debugged


Subject    : ThinkPad doesn't resume from suspend to RAM
References : http://lkml.org/lkml/2007/2/27/80
             http://lkml.org/lkml/2007/2/28/348
Submitter  : Jens Axboe <jens.axboe@oracle.com>
             Jeff Chua <jeff.chua.linux@gmail.com>
Status     : unknown


Subject    : MacBook Core Duo: suspend to memory wakeup hang
References : ...
From: Jeremy Fitzhardinge
Date: Saturday, March 31, 2007 - 10:39 pm

I know this is currently a subject of discussion on lkml, but I wanted
to confirm that booting with "hpet=disable" fixes this, and resume works
for me.  It ends up using acpi_pm as the clocksource.

    J
-

From: Jeff Chua
Date: Friday, March 30, 2007 - 7:52 pm

Fixed with CONFIG_NO_HZ unset and patch from Maxim
(http://lkml.org/lkml/2007/3/29/108).

Thanks,
Jeff,
-

From: Adrian Bunk
Date: Friday, March 30, 2007 - 8:16 pm

Thanks for this information.


cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Jens Axboe
Date: Saturday, March 31, 2007 - 4:08 am

Yep, it does, no problems since the last -rc.

-- 
Jens Axboe

-

From: Michal Piotrowski
Date: Friday, April 13, 2007 - 9:32 am

This problem is fixed in 2.6.21-rc6-git5 (commit
692412b31ffb5df00197ea591dd635fc07506c02).

Huge thanks to Stephen.

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-

From: Adrian Bunk
Date: Friday, March 30, 2007 - 2:49 pm

[ this time with a Cc... ]

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : MacMini doesn't come out of suspend to ram  (i386 clockevents)
             (CONFIG_HPET_TIMER)
References : http://lkml.org/lkml/2007/3/21/374
Submitter  : Frédéric Riss <frederic.riss@gmail.com>
             Tino Keitel <tino.keitel@gmx.de>
Caused-By  : Thomas Gleixner <tglx@linutronix.de>
             commit e9e2cdb412412326c4827fc78ba27f410d837e6e
Status     : unknown


Subject    : after resume: X hangs after drawing a couple of windows
             workaround: clocksource=acpi_pm
References : http://lkml.org/lkml/2007/3/8/117
             http://lkml.org/lkml/2007/3/25/20
             http://lkml.org/lkml/2007/3/26/151
Submitter  : Michael S. Tsirkin <mst@mellanox.co.il>
Handled-By : Thomas Gleixner <tglx@linutronix.de>
Status     : problem is being debugged


Subject    : system doesn't come out of suspend  (CONFIG_NO_HZ)
References : http://lkml.org/lkml/2007/2/22/391
Submitter  : Michael S. Tsirkin <mst@mellanox.co.il>
             Soeren Sonnenburg <kernel@nn7.de>
Handled-By : Thomas Gleixner <tglx@linutronix.de>
             Ingo Molnar <mingo@elte.hu>
             Tejun Heo <htejun@gmail.com>
             Rafael J. Wysocki <rjw@sisk.pl>
Status     : unknown


Subject    : suspend to disk hangs  (CONFIG_NO_HZ)
References : http://lkml.org/lkml/2007/3/25/217
Submitter  : Jeff Chua <jeff.chua.linux@gmail.com>
Status     : unknown


-

From: Frédéric
Date: Friday, March 30, 2007 - 11:44 pm

This one has been fixed by 399afa4fc9238fbae42116cf25a54671c0e8f56e.
Suspend to ram now works with HPET enabled (and regardless of the NO_HZ
setting).

Thanks!

Fred.

-

From: Michael S. Tsirkin
Date: Sunday, April 1, 2007 - 12:04 am

Adrian,
the bug was found by Maxim Levitsky and
the following patch appears to have fixed the problem:
http://lkml.org/lkml/2007/3/28/104

the right way to fix it is still being discussed:
http://lkml.org/lkml/2007/3/28/182


-- 
MST
-

From: Michael S. Tsirkin
Date: Sunday, April 1, 2007 - 1:37 pm

Seems to be resolved with 399afa4fc9238fbae42116cf25a54671c0e8f56e.
Thanks Maxim!


-- 
MST
-

From: Jeff Chua
Date: Friday, March 30, 2007 - 7:41 pm

Still broken on.2.6.21-rc5.

Jeff.
-

From: Adrian Bunk
Date: Saturday, March 31, 2007 - 11:19 am

This email lists some known regressions in Linus' tree compared to 2.6.20
with patches available.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : hung bootup in various drivers
References : http://lkml.org/lkml/2007/3/30/68
Submitter  : Ingo Molnar <mingo@elte.hu>
Handled-By : Kay Sievers <kay.sievers@vrfy.org>
Patch      : http://lkml.org/lkml/2007/3/30/323
Status     : patch available


Subject    : NCQ problem with ahci and Hitachi drive
References : http://lkml.org/lkml/2007/3/4/178
             http://lkml.org/lkml/2007/3/9/475
             http://lkml.org/lkml/2007/2/22/8
Submitter  : Mathieu Bérard <Mathieu.Berard@crans.org>
Handled-By : Tejun Heo <htejun@gmail.com>
             Robert Hancock <hancockr@shaw.ca>
Patch      : http://lkml.org/lkml/2007/2/22/8
Status     : possible patch available


Subject    : suspend to disk hangs  (microcode driver)
References : http://lkml.org/lkml/2007/3/16/126
Submitter  : Maxim Levitsky <maximlevitsky@gmail.com>
Caused-By  : Rafael J. Wysocki <rjw@sisk.pl>
             commit e3c7db621bed4afb8e231cb005057f2feb5db557
             commit ed746e3b18f4df18afa3763155972c5835f284c5
             commit 259130526c267550bc365d3015917d90667732f1
Handled-By : Rafael J. Wysocki <rjw@sisk.pl>
Patch      : http://lkml.org/lkml/2007/3/25/71
Status     : patch available

-

From: Robert Hancock
Date: Monday, April 2, 2007 - 9:05 pm

This adds some NCQ blacklist entries taken from the Silicon Image 3124/3132
Windows driver .inf files. There are some confirming reports of problems
with these drives under Linux (for example http://lkml.org/lkml/2007/3/4/178)
so let's disable NCQ on these drives.

Signed-off-by: Robert Hancock <hancockr@shaw.ca>

--- linux-2.6.21-rc5-git9/drivers/ata/libata-core.c	2007-04-02 21:03:29.000000000 -0600
+++ linux-2.6.21-rc5-git9edit/drivers/ata/libata-core.c	2007-04-02 21:26:23.000000000 -0600
@@ -3363,6 +3363,11 @@ static const struct ata_blacklist_entry 
 	{ "Maxtor 6L250S0",     "BANC1G10",     ATA_HORKAGE_NONCQ },
 	/* NCQ hard hangs device under heavier load, needs hard power cycle */
 	{ "Maxtor 6B250S0",	"BANC1B70",	ATA_HORKAGE_NONCQ },
+	/* Blacklist entries taken from Silicon Image 3124/3132
+	   Windows driver .inf file - also several Linux problem reports */
+	{ "HTS541060G9SA00",    "MB3OC60D",     ATA_HORKAGE_NONCQ, },
+	{ "HTS541080G9SA00",    "MB4OC60D",     ATA_HORKAGE_NONCQ, },
+	{ "HTS541010G9SA00",    "MBZOC60D",     ATA_HORKAGE_NONCQ, },
 
 	/* Devices with NCQ limits */
 

-

From: Tejun Heo
Date: Monday, April 2, 2007 - 9:13 pm

Acked-by: Tejun Heo <htejun@gmail.com>

-- 
tejun
-

From: Jeff Garzik
Date: Tuesday, April 3, 2007 - 11:09 pm

The thread you link to seems like an irq problem, especially because it 
worked in 2.6.20 and prior?

	Jeff



-

From: Robert Hancock
Date: Wednesday, April 4, 2007 - 7:26 am

According to this post:

http://lkml.org/lkml/2007/3/9/475

with 2.6.21-rc3, it started working after the kernel disabled NCQ 
because of too many errors. That seems to point away from it being an 
IRQ problem, as you'd expect it to not work at all. I don't expect the 
interrupts would be handled any differently between NCQ and non-NCQ 
commands. However, apparently disabling ACPI also prevents the problem, 
which does seem a bit odd.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/

-

From: Jeff Garzik
Date: Monday, March 26, 2007 - 10:51 pm

[just got back from vacation, or would have sent this earlier]

FWIW, I'm still leaning towards disabling libata ACPI support by default 
for 2.6.21.

Upstream has Alan's fix for the worst PATA problems, but for different 
reasons, I think PATA ACPI and SATA ACPI support in libata does not feel 
quite ready for prime time in 2.6.21.

Scream now, or hold your peace until 2.6.22... :)

	Jeff


-

From: Tejun Heo
Date: Monday, March 26, 2007 - 10:54 pm

I second disabling ACPI for 2.6.21.

-- 
tejun
-

From: Pavel Machek
Date: Tuesday, March 27, 2007 - 2:32 pm

Ugh.. does that mean we'll have 'regression reports' as in 'it worked
ok in -rc5, broken in final?

Well, suspend is currently so broken that we'll be flooded by reports,
anyway, but.... could we get at least define in code so that we can
tell users to flip it?

Or maybe it is enough to make libata dependend on EXPERIMETAL?
...making it dependend on BROKEN should be definitely enough...

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Tejun Heo
Date: Wednesday, March 28, 2007 - 2:51 am

Just the default value for libata.noacpi is changed to 1, so user can
easily reenable it by passing boot/module parameter.

-- 
tejun
-

From: Linus Torvalds
Date: Tuesday, March 27, 2007 - 10:07 am

Hey, I'm not going to argue against anything that says "disable ACPI". Of 
*course* it should be disabled if there aren't thousands of machines that 
are in user hands that actually need it (and none that regress).

Anybody want to send me a patch?

		Linus
-

From: Jeff Garzik
Date: Tuesday, March 27, 2007 - 11:48 am

It's required to access data at all (BIOS-supplied password [un]locks 
disk), in a small minority of configurations.  It's strongly suggested 
for reliable suspend/resume, particularly on laptops, where libata ACPI 
support fixes some suspend/resume problems.

Some BIOSen also want to apply drive+board-specific errata workarounds. 
  That's OK, but ideally we should know about those in the kernel.

"none that regress" is the problem though.  Buggy tables, unexercised 
ACPI code paths, and in a few cases unexpected post-ACPI 

Since everybody is OK with my plan, I'll send one today along with the 
rest of the post-vacation 2.6.21-rc bug fixes.

	Jeff


-

From: Michal Piotrowski
Date: Tuesday, March 27, 2007 - 11:53 am

I found this in mm snapshot
http://www.ussg.iu.edu/hypermail/linux/kernel/0703.2/1367.html
it's in mainline too.

Andi, any progress with this bug?

BUG: using smp_processor_id() in preemptible [00000001] code: mount/7245
caller is avail_to_resrv_perfctr_nmi_bit+0x1a/0x32
 [<c0105039>] show_trace_log_lvl+0x1a/0x2f
 [<c0105720>] show_trace+0x12/0x14
 [<c01057d2>] dump_stack+0x16/0x18
 [<c01f911e>] debug_smp_processor_id+0xa2/0xb4
 [<c0115832>] avail_to_resrv_perfctr_nmi_bit+0x1a/0x32
 [<fdca69b5>] nmi_create_files+0x2a/0x10e [oprofile]
 [<fdca5f4e>] oprofile_create_files+0xe6/0xec [oprofile]
 [<fdca6153>] oprofilefs_fill_super+0x78/0x7e [oprofile]
 [<c0173516>] get_sb_single+0x46/0x8c
 [<fdca608b>] oprofilefs_get_sb+0x1c/0x1e [oprofile]
 [<c0173406>] vfs_kern_mount+0x81/0xf1
 [<c01734be>] do_kern_mount+0x30/0x42
 [<c0185be9>] do_mount+0x601/0x678
 [<c0185ccf>] sys_mount+0x6f/0xa4
 [<c0104060>] syscall_call+0x7/0xb
 =======================
BUG: using smp_processor_id() in preemptible [00000001] code: mount/7245
caller is avail_to_resrv_perfctr_nmi_bit+0x1a/0x32
 [<c0105039>] show_trace_log_lvl+0x1a/0x2f
 [<c0105720>] show_trace+0x12/0x14
 [<c01057d2>] dump_stack+0x16/0x18
 [<c01f911e>] debug_smp_processor_id+0xa2/0xb4
 [<c0115832>] avail_to_resrv_perfctr_nmi_bit+0x1a/0x32
 [<fdca69b5>] nmi_create_files+0x2a/0x10e [oprofile]
 [<fdca5f4e>] oprofile_create_files+0xe6/0xec [oprofile]
 [<fdca6153>] oprofilefs_fill_super+0x78/0x7e [oprofile]
 [<c0173516>] get_sb_single+0x46/0x8c
 [<fdca608b>] oprofilefs_get_sb+0x1c/0x1e [oprofile]
 [<c0173406>] vfs_kern_mount+0x81/0xf1
 [<c01734be>] do_kern_mount+0x30/0x42
 [<c0185be9>] do_mount+0x601/0x678
 [<c0185ccf>] sys_mount+0x6f/0xa4
 [<c0104060>] syscall_call+0x7/0xb
 =======================
BUG: using smp_processor_id() in preemptible [00000001] code: mount/7245
caller is avail_to_resrv_perfctr_nmi_bit+0x1a/0x32
 [<c0105039>] show_trace_log_lvl+0x1a/0x2f
 [<c0105720>] show_trace+0x12/0x14
 [<c01057d2>] dump_stack+0x16/0x18
 ...
From: Andi Kleen
Date: Wednesday, March 28, 2007 - 7:30 am

Can you test this patch please? 

-Andi

i386/x86-64: Convert nmi reservation to be global

It doesn't make much sense to have this per CPU, because all
the services using NMIs run on all CPUs. So make it global.

This also fixes a warning about unprotected use of smp_processor_id
on preemptible kernels.

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux/arch/i386/kernel/nmi.c
===================================================================
--- linux.orig/arch/i386/kernel/nmi.c
+++ linux/arch/i386/kernel/nmi.c
@@ -41,8 +41,8 @@ int nmi_watchdog_enabled;
  *   different subsystems this reservation system just tries to coordinate
  *   things a little
  */
-static DEFINE_PER_CPU(unsigned long, perfctr_nmi_owner);
-static DEFINE_PER_CPU(unsigned long, evntsel_nmi_owner[3]);
+static unsigned long perfctr_nmi_owner;
+static unsigned long evntsel_nmi_owner[3];
 
 static cpumask_t backtrace_mask = CPU_MASK_NONE;
 
@@ -124,7 +124,7 @@ int avail_to_resrv_perfctr_nmi_bit(unsig
 {
 	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
 
-	return (!test_bit(counter, &__get_cpu_var(perfctr_nmi_owner)));
+	return (!test_bit(counter, &perfctr_nmi_owner));
 }
 
 /* checks the an msr for availability */
@@ -135,7 +135,7 @@ int avail_to_resrv_perfctr_nmi(unsigned 
 	counter = nmi_perfctr_msr_to_bit(msr);
 	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
 
-	return (!test_bit(counter, &__get_cpu_var(perfctr_nmi_owner)));
+	return (!test_bit(counter, &perfctr_nmi_owner));
 }
 
 int reserve_perfctr_nmi(unsigned int msr)
@@ -145,7 +145,7 @@ int reserve_perfctr_nmi(unsigned int msr
 	counter = nmi_perfctr_msr_to_bit(msr);
 	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
 
-	if (!test_and_set_bit(counter, &__get_cpu_var(perfctr_nmi_owner)))
+	if (!test_and_set_bit(counter, &perfctr_nmi_owner))
 		return 1;
 	return 0;
 }
@@ -157,7 +157,7 @@ void release_perfctr_nmi(unsigned int ms
 	counter = nmi_perfctr_msr_to_bit(msr);
 	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
 
-	clear_bit(counter, ...
From: Michal Piotrowski
Date: Wednesday, March 28, 2007 - 7:56 am

BUG: using smp_processor_id() in preemptible [00000001] code: mount/7245 
is fixed, thanks.


but I still get this

[  208.523901] =================================
[  208.529739] [ INFO: inconsistent lock state ]
[  208.534087] 2.6.21-rc5-g28defbea-dirty #131
[  208.538260] ---------------------------------
[  208.542611] inconsistent {hardirq-on-W} -> {in-hardirq-W} usage.
[  208.548600] swapper/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
[  208.553553]  (oprofilefs_lock){+-..}, at: [<fdc85b6a>] nmi_cpu_setup+0x15/0x4f [oprofile]
[  208.561800] {hardirq-on-W} state was registered at:
[  208.566665]   [<c013dba4>] __lock_acquire+0x442/0xba1
[  208.571765]   [<c013e36b>] lock_acquire+0x68/0x82
[  208.576519]   [<c031bb37>] _spin_lock+0x35/0x42
[  208.581102]   [<fdc85343>] oprofilefs_ulong_from_user+0x4e/0x74 [oprofile]
[  208.588026]   [<fdc85393>] ulong_write_file+0x2a/0x38 [oprofile]
[  208.594084]   [<c0172693>] vfs_write+0xaf/0x138
[  208.598658]   [<c0172c5d>] sys_write+0x3d/0x61
[  208.603171]   [<c0104060>] syscall_call+0x7/0xb
[  208.607751]   [<ffffffff>] 0xffffffff
[  208.611478] irq event stamp: 575782
[  208.614960] hardirqs last  enabled at (575781): [<c0102ac2>] default_idle+0x3e/0x59
[  208.622645] hardirqs last disabled at (575782): [<c0104ae9>] call_function_interrupt+0x29/0x38
[  208.631281] softirqs last  enabled at (575768): [<c0126537>] __do_softirq+0xe4/0xea
[  208.638965] softirqs last disabled at (575759): [<c01069b5>] do_softirq+0x64/0xd1
[  208.646478]
[  208.646479] other info that might help us debug this:
[  208.653003] no locks held by swapper/0.
[  208.656832]
[  208.656833] stack backtrace:
[  208.661199]  [<c0105039>] show_trace_log_lvl+0x1a/0x2f
[  208.666350]  [<c0105720>] show_trace+0x12/0x14
[  208.670811]  [<c01057d2>] dump_stack+0x16/0x18
[  208.675272]  [<c013c57f>] print_usage_bug+0x140/0x14a
[  208.680336]  [<c013cd8a>] mark_lock+0xa1/0x40b
[  208.684796]  [<c013db15>] __lock_acquire+0x3b3/0xba1
[  208.689775]  [<c013e36b>] ...
From: Jiri Kosina
Date: Wednesday, March 28, 2007 - 9:12 am

Perhaps something like the one below?


From: Jiri Kosina <jkosina@suse.cz>

oprofile: fix potential deadlock on oprofilefs_lock

nmi_cpu_setup() is called from hardirq context and acquires oprofilefs_lock.
alloc_event_buffer() and oprofilefs_ulong_from_user() acquire this lock
without disabling irqs, which could deadlock.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>

 drivers/oprofile/event_buffer.c |    5 +++--
 drivers/oprofile/oprofilefs.c   |    5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/oprofile/event_buffer.c b/drivers/oprofile/event_buffer.c
index 00e937e..e7fbac5 100644
--- a/drivers/oprofile/event_buffer.c
+++ b/drivers/oprofile/event_buffer.c
@@ -70,11 +70,12 @@ void wake_up_buffer_waiter(void)
 int alloc_event_buffer(void)
 {
 	int err = -ENOMEM;
+	unsigned long flags;
 
-	spin_lock(&oprofilefs_lock);
+	spin_lock_irqsave(&oprofilefs_lock, flags);
 	buffer_size = fs_buffer_size;
 	buffer_watershed = fs_buffer_watershed;
-	spin_unlock(&oprofilefs_lock);
+	spin_unlock_irqrestore(&oprofilefs_lock, flags);
  
 	if (buffer_watershed >= buffer_size)
 		return -EINVAL;
diff --git a/drivers/oprofile/oprofilefs.c b/drivers/oprofile/oprofilefs.c
index 6e67b42..8543cb2 100644
--- a/drivers/oprofile/oprofilefs.c
+++ b/drivers/oprofile/oprofilefs.c
@@ -65,6 +65,7 @@ ssize_t oprofilefs_ulong_to_user(unsigned long val, char __user * buf, size_t co
 int oprofilefs_ulong_from_user(unsigned long * val, char const __user * buf, size_t count)
 {
 	char tmpbuf[TMPBUFSIZE];
+	unsigned long flags;
 
 	if (!count)
 		return 0;
@@ -77,9 +78,9 @@ int oprofilefs_ulong_from_user(unsigned long * val, char const __user * buf, siz
 	if (copy_from_user(tmpbuf, buf, count))
 		return -EFAULT;
 
-	spin_lock(&oprofilefs_lock);
+	spin_lock_irqsave(&oprofilefs_lock, flags);
 	*val = simple_strtoul(tmpbuf, NULL, 0);
-	spin_unlock(&oprofilefs_lock);
+	spin_unlock_irqrestore(&oprofilefs_lock, flags);
 	return 0;
 }

-

From: Michal Piotrowski
Date: Wednesday, March 28, 2007 - 9:51 am

Problem seems to be fixed. Thanks!

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-

From: Linus Torvalds
Date: Wednesday, March 28, 2007 - 10:56 am

NO!

If you do this, then you must make all *callers* be global too. But they 
aren't. Right now all callers do per-CPU setup!

See for example enable_lapic_nmi_watchdog():

	on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1);

where "setup_apic_nmi_watchdog()" will call "setup_k7_watchdog()", which 
in turn will do a per-CPU reservation of the perfctl for the watchdog.

So I agree in that it probably doesn't make sense to have NMI/perfctl 
reservation per-CPU, but you can't just change the reservation and ignore 
all the *users* of that reservation that assumed that it was per-CPU.

Is that code insane? Probably. But it probably also works. After your 
patch, one CPU will be able to reserve the NMI/perfctl thing (fine so far) 
but then all the other CPU's that try to do it will fail.

			Linus
-

From: Michal Piotrowski
Date: Tuesday, March 27, 2007 - 11:34 am

Hi,


Suspend to disk doesn't work for me with this patch. It hangs after
PM: Preparing devices for restore.
Suspending console(s)
during resuming.

a504e64ab42bcc27074ea37405d06833ed6e0820 is first bad commit
commit a504e64ab42bcc27074ea37405d06833ed6e0820
Author: Stephen Hemminger <shemminger@linux-foundation.org>
Date:   Fri Feb 2 08:22:53 2007 -0800

    skge: WOL support

    Add WOL support for Yukon chipsets in skge device.

    Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
5845b004228d811de912a55da6a7843b72f23f81 M      drivers

http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc5/git-config2

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-

From: Pavel Machek
Date: Tuesday, March 27, 2007 - 3:29 pm

Do you use skge as your network device?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Michal Piotrowski
Date: Tuesday, March 27, 2007 - 3:55 pm

Yes, I have a Marvell based onboard NIC.

02:05.0 Ethernet controller: 3Com Corporation 3c940 10/100/1000Base-T [Marvell] (rev 12)
        Subsystem: ASUSTeK Computer Inc. A7V600/P4P800/K8V motherboard
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 64 (5750ns min, 7750ns max), Cache Line Size: 16 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: Memory at f7ffc000 (32-bit, non-prefetchable) [size=16K]
        Region 1: I/O ports at e800 [size=256]
        Capabilities: [48] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=1 PME-

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-

Previous thread: [PATCH] tifm_sd: add missing \n by Daniel Drake on Sunday, March 25, 2007 - 3:24 pm. (2 messages)

Next thread: Re: [patch] add file position info to proc by Neil Brown on Sunday, March 25, 2007 - 5:05 pm. (1 message)