login
Header Space

 
 

Re: Linux 2.6.21-rc1

Previous thread: [patch 1/6] mm: debug check for the fault vs invalidate race by Nick Piggin on Wednesday, February 21, 2007 - 12:49 am. (99 messages)

Next thread: [PATCH] devpts: add fsnotify create event by Florin Malita on Wednesday, February 21, 2007 - 1:15 am. (2 messages)
To: Linux Kernel Mailing List <linux-kernel@...>
Date: Wednesday, February 21, 2007 - 12:53 am

Ok, the merge window for 2.6.21 has closed, and -rc1 is out there.

There's a lot of changes, as is usual for an -rc1 thing, but at least so 
far it would seem that 2.6.20 has been a good base, and I don't think we 
have anything *really* scary here.

The most interesting core change may be the dyntick/nohz one, where timer 
ticks will only happen when needed. It's been brewing for a _loong_ time, 
but it's in the standard kernel now as an option. 

But there's a ton of architecture updates (arm, mips, powerpc, x86, you 
name it), ACPI updates, and lots of driver work. And just a lot of 
cleanups.

Have fun,

			Linus
-
To: Zwane Mwaikambo <zwane@...>
Cc: Dave Jones <davej@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Wednesday, February 21, 2007 - 2:34 pm

I'm getting an undefined symbol with CONFIG_AGP=m:

WARNING: "compat_agp_ioctl" [drivers/char/agp/agpgart.ko] undefined!

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."
-
To: Andreas Schwab <schwab@...>
Cc: Zwane Mwaikambo <zwane@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Wednesday, February 21, 2007 - 2:40 pm

On Wed, Feb 21, 2007 at 07:34:01PM +0100, Andreas Schwab wrote:
 &gt; I'm getting an undefined symbol with CONFIG_AGP=m:
 &gt; 
 &gt; WARNING: "compat_agp_ioctl" [drivers/char/agp/agpgart.ko] undefined!

Fix went to Linus an hour ago.
It's been in -mm for a week, and agpgart.git for a day or so.

		Dave

-- 
http://www.codemonkey.org.uk
-
To: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, David P. Reed <dpreed@...>, Ayaz Abdulla <aabdulla@...>, <jgarzik@...>, <netdev@...>, Albert Hopkins <kernel@...>, Bob Tracy <rct@...>, Mark Brown <broonie@...>, Michael S. Tsirkin <mst@...>, Thomas Gleixner <tglx@...>, Ingo Molnar <mingo@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Daniel Walker <dwalker@...>
Date: Monday, February 26, 2007 - 6:05 pm

This email lists some known regressions in 2.6.21-rc1 compared to 2.6.20
that are not yet fixed in Linus' tree.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : forcedeth no longer works
References : http://bugzilla.kernel.org/show_bug.cgi?id=8090
Submitter  : David P. Reed &lt;dpreed@reed.com&gt;
Caused-By  : Ayaz Abdulla &lt;aabdulla@nvidia.com&gt;
Status     : unknown


Subject    : forcedeth: skb_over_panic
References : http://bugzilla.kernel.org/show_bug.cgi?id=8058
Submitter  : Albert Hopkins &lt;kernel@marduk.letterboxes.org&gt;
Status     : unknown


Subject    : natsemi ethernet card not detected correctly
References : http://lkml.org/lkml/2007/2/23/4
             http://lkml.org/lkml/2007/2/23/7
Submitter  : Bob Tracy &lt;rct@gherkin.frus.com&gt;
Caused-By  : Mark Brown &lt;broonie@sirena.org.uk&gt;
Handled-By : Mark Brown &lt;broonie@sirena.org.uk&gt;
Patch      : http://lkml.org/lkml/2007/2/23/142
Status     : patch available


Subject    : ThinkPad T60: system doesn't come out of suspend to RAM
             (CONFIG_NO_HZ)
References : http://lkml.org/lkml/2007/2/22/391
Submitter  : Michael S. Tsirkin &lt;mst@mellanox.co.il&gt;
Handled-By : Thomas Gleixner &lt;tglx@linutronix.de&gt;
             Ingo Molnar &lt;mingo@elte.hu&gt;
Status     : unknown


Subject    : kernel BUG at kernel/time/tick-sched.c:168  (CONFIG_NO_HZ)
References : http://lkml.org/lkml/2007/2/16/346
Submitter  : Michal Piotrowski &lt;michal.k.k.piotrowski@gmail.com&gt;
Handled-By : Thomas Gleixner &lt;tglx@linutronix.de&gt;
Status     : problem is being debugged


Subject    : BUG: soft lockup detected on CPU#0
             NOHZ: local_softirq_pending 20  (SMT scheduler)
References : h...
To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, David P. Reed <dpreed@...>, Ayaz Abdulla <aabdulla@...>, <jgarzik@...>, <netdev@...>, Albert Hopkins <kernel@...>, Bob Tracy <rct@...>, Mark Brown <broonie@...>, Michael S. Tsirkin <mst@...>, Ingo Molnar <mingo@...>, Michal Piotrowski <michal.k.k.piotrowski@...>
Date: Tuesday, February 27, 2007 - 4:21 am

Adrian,


The BUG_ON() was replaced by a warning printk(). The BUG_ON() exposed a

Patch available, not confirmed yet.

	tglx


-
To: Thomas Gleixner <tglx@...>
Cc: Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, David P. Reed <dpreed@...>, Ayaz Abdulla <aabdulla@...>, <jgarzik@...>, <netdev@...>, Albert Hopkins <kernel@...>, Bob Tracy <rct@...>, Mark Brown <broonie@...>, Michael S. Tsirkin <mst@...>, Ingo Molnar <mingo@...>, Michal Piotrowski <michal.k.k.piotrowski@...>
Date: Tuesday, February 27, 2007 - 4:33 am

I can confirm that the bug is fixed (over 20 hours of testing should be enough).

Huge thanks!

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To: Michal Piotrowski <michal.k.k.piotrowski@...>
Cc: Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, David P. Reed <dpreed@...>, Ayaz Abdulla <aabdulla@...>, <jgarzik@...>, <netdev@...>, Albert Hopkins <kernel@...>, Bob Tracy <rct@...>, Mark Brown <broonie@...>, Michael S. Tsirkin <mst@...>, Ingo Molnar <mingo@...>, Michal Piotrowski <michal.k.k.piotrowski@...>
Date: Tuesday, February 27, 2007 - 4:35 am

^^^^ almost ;)

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To: Michal Piotrowski <michal.k.k.piotrowski@...>
Cc: Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, David P. Reed <dpreed@...>, Ayaz Abdulla <aabdulla@...>, <jgarzik@...>, <netdev@...>, Albert Hopkins <kernel@...>, Bob Tracy <rct@...>, Mark Brown <broonie@...>, Michael S. Tsirkin <mst@...>, Mike Galbraith <efault@...>
Date: Tuesday, February 27, 2007 - 4:33 am

thanks alot! I think this thing was a long-term performance/latency 
regression in HT scheduling as well.

	Ingo
-
To: Ingo Molnar <mingo@...>
Cc: Michal Piotrowski <michal.k.k.piotrowski@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, David P. Reed <dpreed@...>, Ayaz Abdulla <aabdulla@...>, <jgarzik@...>, <netdev@...>, Albert Hopkins <kernel@...>, Bob Tracy <rct@...>, Mark Brown <broonie@...>, Michael S. Tsirkin <mst@...>
Date: Tuesday, February 27, 2007 - 4:54 am

Agreed.

I was recently looking at that spot because I found that niced tasks
were taking latency hits, and disabled it, which helped a bunch.  I also
can't understand why it would be OK to interleave a normal task with an
RT task sometimes, but not others.. that's meaningless to the RT task.

IMHO, SMT scheduling should be a buyer beware thing.  Maximizing your
core utilization comes at a price, but so does disabling it, so I think
letting the user decide what he wants is the right thing to do.

	-Mike

-
To: Mike Galbraith <efault@...>
Cc: Ingo Molnar <mingo@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Tuesday, February 27, 2007 - 7:07 pm

Apologies for the resend, lkml address got mangled...


Ingo I'm going to have to partially disagree with you on this. 

This has only become a problem because of what happens with dynticks now when 
rq-&gt;curr == rq-&gt;idle. Prior to this, that particular SMT code only leads to 
relative delays in scheduling for lower priority tasks. Whether or not that 
task is ksoftirqd should not matter because it is not like they are starved 
indefinitely, it is only that nice 19 tasks are relatively delayed, which by 
definition is implied with the usage of nice as a scheduler hint wouldn't you 
say? I know it has been discussed many times before as to whether 'nice' 
means less cpu and/or more latency, but in our current implementation, nice 
means both less cpu and more latency. So to me, the kernels without dynticks 
do not have a regression. This seems to only be a problem in the setting of 
the new dynticks code IMHO. That's not to say it isn't a bug! Nor am I saying 
that dynticks is a problem! Please don't misinterpret that.

The second issue is that this is a problem because of the fuzzy definition of 
what idle is for a runqueue in the setting of this SMT code. Normally, 
rq-&gt;curr==rq-&gt;idle means the runqueue is idle, but not with this code since 
there are still rq-&gt;nr_running on that runqueue. What dynticks in this 
implementation is doing is trying to idle a hyperthread sibling on a cpu 
whose logical partner is busy. I did not find that added any power saving on 
my earlier dynticks implementation, and found it easier to keep that sibling 
ticking at the same rate as its partner. Of course you may have found 
something different, and I definitely agree with what you are likely to say 
in response to this- we shouldn't have to special case logical siblings as 
having a different definition of idle than any other smp case. Ultimately, 
that leaves us with your simple patch as a reasonable solution for the 
dynticks case even though it does change the behaviour dramatically...
To: Con Kolivas <kernel@...>
Cc: Ingo Molnar <mingo@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Wednesday, February 28, 2007 - 12:21 am

(hrmph.  having to copy/paste/try again.  evolution seems to be broken..
RCPT TO &lt;linux-kernel@vg.kolivas.org&gt; failed: Cannot resolve your domain {mp049}
..caused me to be unable to send despite receipts being disabled)


No I'm not, but let's go further in that direction just for the sake of
argument.  You're then saying that you prefer realtime priorities to not
work in the HT setting, given that realtime tasks don't participate in
the 'single stream me' program.
 
I'm saying only that we're defeating the purpose of HT, and overriding a


So?  User asked for HT.  That's hardware multiplexing. It ain't free.

I don't think it does actually. Let your RT task sleep regularly, and
ever so briefly.  We don't evict lower priority tasks from siblings upon

To me, the reason for interleaving is solely about keeping the core

Re-read this paragraph with realtime task priorities in mind, or for
that matter, dynamic priorities.  If you carry your priority/throughput
argument to it's logical conclusion, only instruction streams of
absolutely equal priority should be able to share the core at any given
time.  You may as well just disable HT and be done with it.

To me, siblings are logically separate units, and should be treated as
such (as they mostly are).  They share an important resource, but so do
physically discrete units.

	-Mike


-
To: Mike Galbraith <efault@...>
Cc: Ingo Molnar <mingo@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Wednesday, February 28, 2007 - 6:01 pm

Where do I say that? I do not presume to manage realtime priorities in any 
way. You're turning my argument about nice levels around and somehow saying 
that because hyperthreading breaks the single stream me semantics by 
parallelising them that I would want to stop that happening. Nowhere have I 
argued that realtime semantics should be changed to somehow work around 
hyperthreading. SMT nice is about managing nice only, and not realtime 

But the buyer is not aware. You are aware because you tinker, but the vast 
majority of users who enable hyperthreading in their shiny pcs are not aware. 
The only thing they know is that if they enable hyperthreading their programs 
run slower in multitasking environments no matter how much they nice the 
other processes. Buyers do not buy hardware knowing that the internal design 
breaks something as fundamental as 'nice'. You seem to presume that most 
people who get hyperthreading are happy to compromise 'nice' in order to get 
their second core working and I put it to you that they do not make that 

Well you know as well as I do that you're selecting out the exception rather 

Well that's certainly taking my logic for a ride. This is about managing 
_nice_ and _only_ nice. Nice specifies fixed interval static priorities where 
(in the risk of repeating myself) you are specifying that higher nice values 
tasks you wish to receive less cpu and more latency. Dynamic priorities have 
absolutely no effect on what the discrepancies are between the static 
priorities of differing nice values. As for realtime priorities, again, I do 
not presume to be managing them with SMT nice. They are unique entities 
unrelated to nice values. The only thing they have in common with nice levels 
is that if something is running without a realtime priority, it should be 
preempted by the realtime task as you have specified that the realtime task 
should receive all the cpu over the non-realtime task. I don't pretend that 
there is some cpu percentage relations...
To: Con Kolivas <kernel@...>
Cc: Ingo Molnar <mingo@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Wednesday, February 28, 2007 - 8:02 pm

I see no real difference between the two assertions.  Nice is just a
mechanism to set priority, so I applied your assertion to a different
range of priorities than nice covers, and returned it to show that the
code contradicts itself.  It can't be bad for a nice 1 task to run with
a nice 0 task, but OK for a minimum RT task to run with a maximum RT
task.  Iff HT without corrective measures breaks nice, then it breaks


To me it's pretty much black and white.  Either you want to split your
cpu into logical units, which means each has less to offer than the

I don't agree that it's the exception, and if you look at this HT thing
from the split cpu perspective, I'm not sure there's even a problem.

Scrolling down, I see that this is getting too long, and we aren't

The above will have to do.

	-Mike

-
To: Mike Galbraith <efault@...>
Cc: Con Kolivas <kernel@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Thursday, March 1, 2007 - 4:46 am

i'm starting to lean towards your view that we should not artificially 
keep tasks from running, when there's a free CPU available. We should 
still keep the 'other half' of SMT scheduling: the immediate pushing of 
tasks to a related core, but this bit of 'do not run tasks on this CPU' 
dependent-sleeper logic is i think a bit fragile. Plus these days SMT 
siblings do not tend to influence each other in such a negative way as 
older P4 ones where a HT sibling would slow down the other sibling 
significantly.

plus with an increasing number of siblings (which seems like an 
inevitable thing on the hardware side), the dependent-sleeper logic 
becomes less and less scalable. We'd have to cross-check every other 
'related' CPU's current priority to decide what to run.

if then there should be a mechanism /in the hardware/ to set the 
priority of a CPU - and then the hardware could decide how to prioritize 
between siblings. Doing this in software is really hard.

	Ingo
-
To: Ingo Molnar <mingo@...>
Cc: Mike Galbraith <efault@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Thomas Gleixner <tglx@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Thursday, March 1, 2007 - 7:13 am

Well it is meant to be tuned to the cpu type in per_cpu_gain. So it should be 
easy to be set to the appropriate scaling. It was never meant to be a one 

Yes even I've commented before that this current system is unworkable come 
multiple shared power threads. This I do see as a real problem with it - in 

And that's the depressing part because of course I was interested in that as 
the original approach to the problem (and it was a big problem). When I spoke 
to Intel and AMD (of course to date no SMT AMD chip exists) at kernel summit 
they said it was too hard to implement hardware priorities well. Which is 
real odd since IBM have already done it with Power...

Still I think it has been working fine in software till now, but now it has to 
deal with the added confusion of dynticks, so I already know what will happen 
to it.

Hrm it's been a good time for my code all round... I think I'll just swap 
prefetch myself up the staircase to some pluggable scheduler that would 
hyperthread me to sleep as an idle priority task.

-- 
-ck
-
To: Con Kolivas <kernel@...>
Cc: Ingo Molnar <mingo@...>, Mike Galbraith <efault@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Thursday, March 1, 2007 - 7:33 am

Well, it's not a dyntick problem in the first place. Even w/o dynticks
we go idle with local_softirq_pending(). Dynticks contains an explicit
check for that, which makes it visible.

	tglx


	

-
To: <tglx@...>
Cc: Ingo Molnar <mingo@...>, Mike Galbraith <efault@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Thursday, March 1, 2007 - 8:05 am

Oops I'm sorry if I made it sound like there's a dynticks problem. That was 
not my intent and I said as much in an earlier email. Even though I'm finding 
myself defending code that has already been softly tagged for redundancy, 
let's be clear here; we're talking about at most a further 70ms delay in 
scheduling a niced task in the presence of a nice 0 task, which is a 
reasonable delay for ksoftirqd which we nice the eyeballs out of in mainline. 
Considering under load our scheduler has been known to cause scheduling 
delays of 10 seconds I still don't see this as a bug. Dynticks just "points 
it out to us".

-- 
-ck
-
To: Con Kolivas <kernel@...>
Cc: <tglx@...>, Mike Galbraith <efault@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Thursday, March 1, 2007 - 9:30 am

well, not running softirqs when we could is a bug. It's not a big bug, 
but it's a bug nevertheless. It doesnt matter that softirqs could be 
delayed even worse under high load - there was no 'high load' here.

	Ingo
-
To: Ingo Molnar <mingo@...>
Cc: <tglx@...>, Mike Galbraith <efault@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Thursday, March 1, 2007 - 5:51 pm

Gotcha. I'll prepare a smt-nice removal patch shortly. 

-- 
-ck
-
To: Ingo Molnar <mingo@...>
Cc: <tglx@...>, Mike Galbraith <efault@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Thursday, March 1, 2007 - 6:33 pm

Remove the SMT-nice feature which idles sibling cpus on SMT cpus to
facilitiate nice working properly where cpu power is shared. The idling
of cpus in the presence of runnable tasks is considered too fragile, easy
to break with outside code, and the complexity of managing this system if an
architecture comes along with many logical cores sharing cpu power will be
unworkable.

Remove the associated per_cpu_gain variable in sched_domains used only by
this code.

Signed-off-by: Con Kolivas &lt;kernel@kolivas.org&gt;

---
 include/asm-i386/topology.h           |    1 
 include/asm-ia64/topology.h           |    2 
 include/asm-mips/mach-ip27/topology.h |    1 
 include/asm-powerpc/topology.h        |    1 
 include/asm-x86_64/topology.h         |    1 
 include/linux/sched.h                 |    1 
 include/linux/topology.h              |    4 
 kernel/sched.c                        |  155 ----------------------------------
 8 files changed, 1 insertion(+), 165 deletions(-)

Index: linux-2.6.21-rc2/kernel/sched.c
===================================================================
--- linux-2.6.21-rc2.orig/kernel/sched.c	2007-03-02 08:56:45.000000000 +1100
+++ linux-2.6.21-rc2/kernel/sched.c	2007-03-02 08:58:40.000000000 +1100
@@ -3006,23 +3006,6 @@ static inline void idle_balance(int cpu,
 }
 #endif
 
-static inline void wake_priority_sleeper(struct rq *rq)
-{
-#ifdef CONFIG_SCHED_SMT
-	if (!rq-&gt;nr_running)
-		return;
-
-	spin_lock(&amp;rq-&gt;lock);
-	/*
-	 * If an SMT sibling task has been put to sleep for priority
-	 * reasons reschedule the idle task to see if it can now run.
-	 */
-	if (rq-&gt;nr_running)
-		resched_task(rq-&gt;idle);
-	spin_unlock(&amp;rq-&gt;lock);
-#endif
-}
-
 DEFINE_PER_CPU(struct kernel_stat, kstat);
 
 EXPORT_PER_CPU_SYMBOL(kstat);
@@ -3239,10 +3222,7 @@ void scheduler_tick(void)
 
 	update_cpu_clock(p, rq, now);
 
-	if (p == rq-&gt;idle)
-		/* Task on the idle queue */
-		wake_priority_sleeper(rq);
-	else
+	if (p != rq-&gt;idl...
To: Con Kolivas <kernel@...>
Cc: Ingo Molnar <mingo@...>, Mike Galbraith <efault@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Thursday, March 1, 2007 - 8:20 am

Well, dyntick might end up to delay it for X seconds as well, which _is_
observable and that's why the check was put there in the first place.

	tglx


-
To: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Pavel Machek <pavel@...>, Marcel Holtmann <marcel@...>, <linux-pm@...>, Michael S. Tsirkin <mst@...>, Ingo Molnar <mingo@...>, <lenb@...>, <linux-acpi@...>, <luming.yu@...>, Arkadiusz Miskiewicz <arekm@...>, Konstantin Karasyov <konstantin.a.karasyov@...>, <linux-usb-devel@...>, Ismail <ismail@...>, Fabio Comolli <fabio.comolli@...>, Thomas Meyer <thomas.mey@...>, Andrew Nelless <andrew@...>, Antonino A. Daplas <adaplas@...>, Janosch Machowinski <jmachowinski@...>, <vladimir.p.lebedev@...>, Lukas Hejtmanek <xhejtman@...>, Meelis Roos <mroos@...>, <jgarzik@...>, <linux-ide@...>, Tejun Heo <htejun@...>, Jean-Luc Coulon <jean.luc.coulon@...>, Markus Trippelsdorf <markus@...>, Rafael J. Wysocki <rjw@...>
Date: Monday, February 26, 2007 - 6:01 pm

This email lists some known regressions in 2.6.21-rc1 compared to 2.6.20
that are not yet fixed in Linus' tree.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : resume: slab error in verify_redzone_free(): cache `size-512':
                     memory outside object was overwritten
References : http://lkml.org/lkml/2007/2/24/41
Submitter  : Pavel Machek &lt;pavel@ucw.cz&gt;
Handled-By : Marcel Holtmann &lt;marcel@holtmann.org&gt;
Status     : unknown


Subject    : ThinkPad T60: no screen after suspend to RAM
References : http://lkml.org/lkml/2007/2/22/391
Submitter  : Michael S. Tsirkin &lt;mst@mellanox.co.il&gt;
Handled-By : Ingo Molnar &lt;mingo@elte.hu&gt;
Status     : unknown


Subject    : HP nx6325 notebook: usb mouse stops working after suspend to ram
References : http://lkml.org/lkml/2007/2/21/413
Submitter  : Arkadiusz Miskiewicz &lt;arekm@maven.pl&gt;
Caused-By  : Konstantin Karasyov &lt;konstantin.a.karasyov@intel.com&gt;
             commit 0a6139027f3986162233adc17285151e78b39cac
Status     : unknown


Subject    : ACPI update breaks kpowersave
References : http://lkml.org/lkml/2007/2/10/7
Submitter  : Ismail Dönmez &lt;ismail@pardus.org.tr&gt;
             Fabio Comolli &lt;fabio.comolli@gmail.com&gt;
Status     : unknown


Subject    : MacBook: AE_NOT_FOUND ACPI messages
References : http://bugzilla.kernel.org/show_bug.cgi?id=8066
Submitter  : Thomas Meyer &lt;thomas.mey@web.de&gt;
Status     : unknown


Subject    : Asus A8N-VM motherboard:
             framebuffer/console boot failure boot failure (ACPI related)
References : http://lkml.org/lkml/2007/2/23/132
Submitter  : Andrew Nelless &lt;andrew@nelless.net&gt;
Handled-By : Antonino A. Daplas &lt;...
To: Adrian Bunk <bunk@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, <linux-acpi@...>, <linux-ide@...>
Date: Tuesday, February 27, 2007 - 9:00 am

Still appears, but this does not seem to be 40/80 pin cable problem to 
be but rather ata-piix calling some acpi methods and this rulsts in acpi 
errors.

-- 
Meelis Roos (mroos@linux.ee)
-
To: Meelis Roos <mroos@...>
Cc: Adrian Bunk <bunk@...>, Linux Kernel Mailing List <linux-kernel@...>, <linux-acpi@...>, <linux-ide@...>
Date: Tuesday, February 27, 2007 - 10:16 am

On Tue, 27 Feb 2007 15:00:29 +0200 (EET)

There are two separate problems showing up in the one trace - broken ACPI
spew and wrong cable detect. I don't think they are related
-
To: Adrian Bunk <bunk@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Pavel Machek <pavel@...>, <linux-pm@...>, Ingo Molnar <mingo@...>, <lenb@...>, <linux-acpi@...>
Date: Wednesday, February 28, 2007 - 5:13 pm

Just reproduced this in -rc2.
Another thing I noticed:
with 2.6.20, pressing Fn/F4 generates an ACPI event and triggers suspend to RAM.

On 2.6.21-rc2, after resume (when the box is accessible from network),
pressing Fn/F4 again does not seem to have any effect.


-- 
MST
-
To: Michael S. Tsirkin <mst@...>
Cc: Adrian Bunk <bunk@...>, Linux Kernel Mailing List <linux-kernel@...>, Pavel Machek <pavel@...>, <linux-pm@...>, Ingo Molnar <mingo@...>, <lenb@...>, <linux-acpi@...>
Date: Wednesday, February 28, 2007 - 11:45 pm

I have the same problem on my IBM X60s on rc1 and rc2. Can't resume
from RAM, can't suspend to disk. It is possible to revert all the
changes to ACPI and test it?

Jeff.
-
To: Jeff Chua <jeff.chua.linux@...>
Cc: Michael S. Tsirkin <mst@...>, Linux Kernel Mailing List <linux-kernel@...>, Pavel Machek <pavel@...>, <linux-pm@...>, Ingo Molnar <mingo@...>, <lenb@...>, <linux-acpi@...>
Date: Sunday, March 4, 2007 - 8:04 pm

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-
To: Adrian Bunk <bunk@...>
Cc: Michael S. Tsirkin <mst@...>, Linux Kernel Mailing List <linux-kernel@...>, Pavel Machek <pavel@...>, <linux-pm@...>, Ingo Molnar <mingo@...>, <lenb@...>, <linux-acpi@...>
Date: Monday, March 5, 2007 - 9:32 pm

Yes.
-
To: Adrian Bunk <bunk@...>
Cc: Michael S. Tsirkin <mst@...>, Linux Kernel Mailing List <linux-kernel@...>, Pavel Machek <pavel@...>, <linux-pm@...>, Ingo Molnar <mingo@...>, <lenb@...>, <linux-acpi@...>
Date: Tuesday, March 6, 2007 - 8:03 am

I've tried with CONFIG_KVM=n and CONFIG_KVM=y and both does not suspend.
-
To: Jeff Chua <jeff.chua.linux@...>
Cc: Adrian Bunk <bunk@...>, Linux Kernel Mailing List <linux-kernel@...>, Pavel Machek <pavel@...>, <linux-pm@...>, Ingo Molnar <mingo@...>, <lenb@...>, <linux-acpi@...>
Date: Tuesday, March 6, 2007 - 8:08 am

Do you mean that they "do not resume after suspend"?

-- 
MST
-
To: Michael S. Tsirkin <mst@...>
Cc: Adrian Bunk <bunk@...>, Linux Kernel Mailing List <linux-kernel@...>, Pavel Machek <pavel@...>, <linux-pm@...>, Ingo Molnar <mingo@...>, <lenb@...>, <linux-acpi@...>
Date: Tuesday, March 6, 2007 - 8:12 am

I can't even suspend to disk/ram. It just hangs and the lights just
blink and everything else hangs. With 2.6.20, it works fine.

Jeff.
-
To: Jeff Chua <jeff.chua.linux@...>
Cc: Michael S. Tsirkin <mst@...>, Adrian Bunk <bunk@...>, Linux Kernel Mailing List <linux-kernel@...>, <linux-pm@...>, Ingo Molnar <mingo@...>, <lenb@...>, <linux-acpi@...>
Date: Monday, March 19, 2007 - 11:32 am

Turn up console loglevel, and see where it hangs...

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To: Pavel Machek <pavel@...>
Cc: Jeff Chua <jeff.chua.linux@...>, Michael S. Tsirkin <mst@...>, Adrian Bunk <bunk@...>, Linux Kernel Mailing List <linux-kernel@...>, <linux-pm@...>, Ingo Molnar <mingo@...>, <lenb@...>, <linux-acpi@...>
Date: Monday, March 19, 2007 - 5:23 pm

I think CONFIG_DISABLE_CONSOLE_SUSPEND would have to be set for this purpose
too.

Greetings,
Rafael
-
To: Jeff Chua <jeff.chua.linux@...>
Cc: Michael S. Tsirkin <mst@...>, <linux-acpi@...>, Ingo Molnar <mingo@...>, Adrian Bunk <bunk@...>, <linux-pm@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Friday, March 2, 2007 - 8:26 am

As I said elsewhere in the thread, suspend/resume to RAM works ok on
my thinkpad x60. I posted my .config there, perhaps difference is in
it? Ingo identified KVM as possible culprit.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To: Pavel Machek <pavel@...>
Cc: Jeff Chua <jeff.chua.linux@...>, Michael S. Tsirkin <mst@...>, <linux-acpi@...>, Ingo Molnar <mingo@...>, Adrian Bunk <bunk@...>, <linux-pm@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Saturday, March 3, 2007 - 7:17 am

I'll try your .config for kicks, the problem that Ingo pin pointed is
not what is affecting me.

-- 
Jens Axboe

-
To: Michael S. Tsirkin <mst@...>
Cc: Adrian Bunk <bunk@...>, Linux Kernel Mailing List <linux-kernel@...>, Pavel Machek <pavel@...>, <linux-pm@...>, Ingo Molnar <mingo@...>, <lenb@...>, <linux-acpi@...>
Date: Wednesday, February 28, 2007 - 5:27 pm

Can you please get the dmesg output after resume via the network ?

	tglx


-
To: Thomas Gleixner <tglx@...>
Cc: Adrian Bunk <bunk@...>, Linux Kernel Mailing List <linux-kernel@...>, Pavel Machek <pavel@...>, <linux-pm@...>, Ingo Molnar <mingo@...>, <lenb@...>, <linux-acpi@...>
Date: Wednesday, February 28, 2007 - 5:40 pm

The link above has it.

-- 
MST
-
To: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Albert Hopkins <kernel@...>, Ayaz Abdulla <aabdulla@...>, <jgarzik@...>, <netdev@...>, Bob Tracy <rct@...>, Mark Brown <broonie@...>, YOSHIFUJI Hideaki / <yoshfuji@...>, Kay Sievers <kay.sievers@...>, Greg KH <greg@...>, Michael-Luke Jones <mlj28@...>, Pete Clements <clem@...>, Sid Boyce <g3vbv@...>, Chuck Lever <chuck.lever@...>, Andreas Schwab <schwab@...>, Dave Jones <davej@...>
Date: Sunday, February 25, 2007 - 2:02 pm

This email lists some known regressions in 2.6.21-rc1 compared to 2.6.20
that are not yet fixed in Linus' tree.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : forcedeth: skb_over_panic
References : http://bugzilla.kernel.org/show_bug.cgi?id=8058
Submitter  : Albert Hopkins &lt;kernel@marduk.letterboxes.org&gt;
Status     : unknown


Subject    : natsemi ethernet card not detected correctly
References : http://lkml.org/lkml/2007/2/23/4
             http://lkml.org/lkml/2007/2/23/7
Submitter  : Bob Tracy &lt;rct@gherkin.frus.com&gt;
Caused-By  : Mark Brown &lt;broonie@sirena.org.uk&gt;
Handled-By : Mark Brown &lt;broonie@sirena.org.uk&gt;
Patch      : http://lkml.org/lkml/2007/2/23/142
Status     : patch available


Subject    : request_module: runaway loop modprobe net-pf-1
References : http://lkml.org/lkml/2007/2/21/206
Submitter  : YOSHIFUJI Hideaki / 吉藤英明 &lt;yoshfuji@linux-ipv6.org&gt;
Caused-By  : Kay Sievers &lt;kay.sievers@vrfy.org&gt;
             commit c353c3fb0700a3c17ea2b0237710a184232ccd7f
Handled-By : Greg KH &lt;greg@kroah.com&gt;
Status     : problem is being discussed


Subject    : IPV6=m, SUNRPC=y compile error
References : http://bugzilla.kernel.org/show_bug.cgi?id=8050
             http://lkml.org/lkml/2007/2/12/442
             http://lkml.org/lkml/2007/2/20/384
Submitter  : Michael-Luke Jones &lt;mlj28@cam.ac.uk&gt;
             Pete Clements &lt;clem@clem.clem-digital.net&gt;
             Sid Boyce &lt;g3vbv@blueyonder.co.uk&gt;
Caused-By  : Chuck Lever &lt;chuck.lever@oracle.com&gt;
Handled-By : YOSHIFUJI Hideaki / 吉藤英明 &lt;yoshfuji@linux-ipv6.org&gt;
Status     : patch available


Subject    : WARNING: "compat_agp_ioctl" undefin...
To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Albert Hopkins <kernel@...>, Ayaz Abdulla <aabdulla@...>, <jgarzik@...>, <netdev@...>, Bob Tracy <rct@...>, Mark Brown <broonie@...>, YOSHIFUJI Hideaki / ???????????? <yoshfuji@...>, Kay Sievers <kay.sievers@...>, Michael-Luke Jones <mlj28@...>, Pete Clements <clem@...>, Sid Boyce <g3vbv@...>, Chuck Lever <chuck.lever@...>, Andreas Schwab <schwab@...>, Dave Jones <davej@...>
Date: Sunday, February 25, 2007 - 4:59 pm

Patch has been reverted and submitted to Linus to pull, but he's out of
town right now...

thanks,

greg k-h
-
To: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Michael S. Tsirkin <mst@...>, Thomas Gleixner <tglx@...>, Ingo Molnar <mingo@...>, <pavel@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Daniel Walker <dwalker@...>
Date: Sunday, February 25, 2007 - 1:55 pm

This email lists some known regressions in 2.6.21-rc1 compared to 2.6.20
that are not yet fixed in Linus' tree.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject    : ThinkPad T60: system doesn't come out of suspend to RAM
             (CONFIG_NO_HZ)
References : http://lkml.org/lkml/2007/2/22/391
Submitter  : Michael S. Tsirkin &lt;mst@mellanox.co.il&gt;
             Thomas Gleixner &lt;tglx@linutronix.de&gt;
Handled-By : Ingo Molnar &lt;mingo@elte.hu&gt;
Status     : unknown


Subject    : kernel BUG at kernel/time/tick-sched.c:168  (CONFIG_NO_HZ)
References : http://lkml.org/lkml/2007/2/16/346
Submitter  : Michal Piotrowski &lt;michal.k.k.piotrowski@gmail.com&gt;
Handled-By : Thomas Gleixner &lt;tglx@linutronix.de&gt;
Status     : problem is being debugged


Subject    : BUG: soft lockup detected on CPU#0
             NOHZ: local_softirq_pending 20
References : http://lkml.org/lkml/2007/2/20/257
Submitter  : Michal Piotrowski &lt;michal.k.k.piotrowski@gmail.com&gt;
Handled-By : Thomas Gleixner &lt;tglx@linutronix.de&gt;
             Ingo Molnar &lt;mingo@elte.hu&gt;
Status     : problem is being debugged


Subject    : i386: no boot with nmi_watchdog=1  (clockevents)
References : http://lkml.org/lkml/2007/2/21/208
Submitter  : Daniel Walker &lt;dwalker@mvista.com&gt;
Caused-By  : Thomas Gleixner &lt;tglx@linutronix.de&gt;
             commit e9e2cdb412412326c4827fc78ba27f410d837e6e
Handled-By : Thomas Gleixner &lt;tglx@linutronix.de&gt;
Status     : problem is being debugged


-
To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michael S. Tsirkin <mst@...>, Thomas Gleixner <tglx@...>, Ingo Molnar <mingo@...>, <pavel@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Daniel Walker <dwalker@...>
Date: Tuesday, February 27, 2007 - 6:02 am

x60 doesn't resume from S2R either, it doesn't matter if CONFIG_NO_HZ is
set or not though. 2.6.20 worked fine.

-- 
Jens Axboe

-
To: Jens Axboe <jens.axboe@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michael S. Tsirkin <mst@...>, Thomas Gleixner <tglx@...>, Ingo Molnar <mingo@...>, <pavel@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Daniel Walker <dwalker@...>
Date: Tuesday, February 27, 2007 - 6:09 pm

Is this

Subject    : ThinkPad T60: no screen after suspend to RAM
References : http://lkml.org/lkml/2007/2/22/391
Submitter  : Michael S. Tsirkin &lt;mst@mellanox.co.il&gt;
             Ingo Molnar &lt;mingo@elte.hu&gt;
Handled-By : Ingo Molnar &lt;mingo@elte.hu&gt;
Status     : unknown


cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-
To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michael S. Tsirkin <mst@...>, Thomas Gleixner <tglx@...>, Ingo Molnar <mingo@...>, <pavel@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Daniel Walker <dwalker@...>
Date: Wednesday, February 28, 2007 - 3:41 am

It doesn't resume at all.

-- 
Jens Axboe

-
To: Jens Axboe <jens.axboe@...>
Cc: Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michael S. Tsirkin <mst@...>, Thomas Gleixner <tglx@...>, Ingo Molnar <mingo@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Daniel Walker <dwalker@...>
Date: Tuesday, February 27, 2007 - 6:21 am

It somehow works for me. As long as I do not play with bluetooth and
suspend to disk...
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To: Pavel Machek <pavel@...>
Cc: Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michael S. Tsirkin <mst@...>, Thomas Gleixner <tglx@...>, Ingo Molnar <mingo@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Daniel Walker <dwalker@...>
Date: Tuesday, February 27, 2007 - 6:30 am

It locks solid here on resume, going back to 2.6.20 makes it work
perfectly again. In between 2.6.20 and 2.6.21-rc1 some ACPI change broke
resume, but that got fixed. Some other change later snuck in that broke
it AGAIN for me, sigh.

I don't use bluetooth nor suspend to disk.

-- 
Jens Axboe

-
To: Jens Axboe <jens.axboe@...>
Cc: Pavel Machek <pavel@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michael S. Tsirkin <mst@...>, Thomas Gleixner <tglx@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Daniel Walker <dwalker@...>
Date: Tuesday, February 27, 2007 - 6:34 am

resume is stock on my T60 too. So you mean v2.6.21-rc1 vanilla works 
fine? Do you know a commit ID that works for sure? I'd like to bisect 
this, but this way i might just find that ACPI change that got already 
fixed later on (and then got re-broken).

	Ingo
-
To: Ingo Molnar <mingo@...>
Cc: Pavel Machek <pavel@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michael S. Tsirkin <mst@...>, Thomas Gleixner <tglx@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Daniel Walker <dwalker@...>
Date: Tuesday, February 27, 2007 - 6:59 am

Nope, 2.6.21-rc1 vanilla does not work. 2.6.20 works. 2.6.20-gitX worked
until some acpi change broke it, the below patch fixed that for me. That
got merged in a later 2.6.20-gitY, but then some other patch broke it
again so that 2.6.21-rc1 is broken. Not much luck there :-)

So it looks like:

- c5a7156959e89b32260ad6072bbf5077bcdfbeee broke 2.6.20-git
- f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38 should fix that.
- Something later than f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38 broke it

Yeah, it gets trickier. I'll try
f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38 now and see if that works, then
bisect to 2.6.21-rc1 to find the other offender. I hope the other
offender didn't get added before
f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38, we'll see :-)

-- 
Jens Axboe

-
To: Ingo Molnar <mingo@...>
Cc: Pavel Machek <pavel@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michael S. Tsirkin <mst@...>, Thomas Gleixner <tglx@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Daniel Walker <dwalker@...>
Date: Tuesday, February 27, 2007 - 7:15 am

f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38 works for me, starting bisect.

-- 
Jens Axboe

-
To: Jens Axboe <jens.axboe@...>
Cc: Pavel Machek <pavel@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michael S. Tsirkin <mst@...>, Thomas Gleixner <tglx@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Daniel Walker <dwalker@...>
Date: Thursday, March 1, 2007 - 5:34 am

update: f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38 works for me too, and 
01363220f5d23ef68276db8974e46a502e43d01d is broken. I too will attempt 
to bisect this.

	Ingo
-
To: Ingo Molnar <mingo@...>
Cc: Jens Axboe <jens.axboe@...>, Pavel Machek <pavel@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Thomas Gleixner <tglx@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Daniel Walker <dwalker@...>
Date: Monday, March 5, 2007 - 11:34 am

f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38 works for me, too.

-- 
MST
-
To: Ingo Molnar <mingo@...>
Cc: Jens Axboe <jens.axboe@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Michael S. Tsirkin <mst@...>, Thomas Gleixner <tglx@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Daniel Walker <dwalker@...>
Date: Friday, March 2, 2007 - 6:07 am

Strange; on my x60, suspend to ram works okay.

(Resume is very slow, because disks are not spinned up properly; and
there's something wrong with timers; console beeps take way too long).

dmesg attached.

That's with

commit 7b965e0884cee430ffe5dc81cdb117b9316b0549
tree 754dce6432258e0a8c3a758e13a34eb3a1d22ee1
parent 5a39e8c6d655b4fe8305ef8cc2d9bbe782bfee5f
author Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt; Wed, 28 Feb 2007
20:13:55 -0800
committer Linus Torvalds &lt;torvalds@woody.linux-foundation.org&gt; Thu, 01
Mar 2007 14:53:39 -0800

    [PATCH] VM: invalidate_inode_pages2_range() should not exit early

    Fix invalidate_inode_pages2_range() so that it does not
immediately exit
    just because a single page in the specified range could not be
removed.

    Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
    Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
    Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
To: Pavel Machek <pavel@...>
Cc: Ingo Molnar <mingo@...>, Jens Axboe <jens.axboe@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Thomas Gleixner <tglx@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Daniel Walker <dwalker@...>
Date: Monday, March 5, 2007 - 4:42 am

Pavel, I tried with your .config, and indeed the system came back to life after
2-3 minutes after I press Fn/F4, indeed the issue seems to be with the disk.
It could be that the same takes place with my original .config - maybe
I just wasn't patient enough. I'll need to re-test that.

However, I noticed that, after resume, when the system is presumably functional,
if I try to suspend to ram again, this second suspend hangs, displaying
the following on screen:

[   17.170000] ACPI: PCI Interrupt 0000:02:00.0[A] -&gt; GSI 16 (level, low) -&gt; IRQ 20
[   17.170000] PCI: Setting latency timer of device 0000:02:00.0 to 64
[   17.250000] e1000: 0000:02:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x1) 00:16:41:5
4:6c:47
[   17.330000] e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection

the crescent LED starts blinking and does not seem to stop for at lest 10 min,
I've run out of patience after that. It could be that it's just very slow again.

Pavel, did you try suspend to RAM after a successfull resume from RAM?

Under 2.6.20, the system suspends/resumes to memory within about 20 sec
any number of times.

-- 
MST
-
To: Michael S. Tsirkin <mst@...>
Cc: Daniel Walker <dwalker@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Thomas Gleixner <tglx@...>, Andrew Morton <akpm@...>, Jens Axboe <jens.axboe@...>, <linux-pm@...>, Ingo Molnar <mingo@...>, Linus Torvalds <torvalds@...>, Linux Kernel Mailing List <linux-kernel@...>, Adrian Bunk <bunk@...>
Date: Friday, March 9, 2007 - 2:44 am

Seems to work ok in -rc3... as long as I do not mix s2ram with s2disk.

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To: Michael S. Tsirkin <mst@...>
Cc: Pavel Machek <pavel@...>, Jens Axboe <jens.axboe@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Thomas Gleixner <tglx@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Jeff Garzik <jgarzik@...>, Auke Kok <auke-jan.h.kok@...>
Date: Monday, March 5, 2007 - 6:11 am

the spin-up takes a few seconds here under suspend/resume simulation:

 | ata1: waiting for device to spin up (7 secs)
 | Restarting tasks ... done.

 [5-10 seconds pass]

 | ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 | ata1.00: configured for UDMA/100
 | SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
 | sda: Write Protect is off
 | sda: Mode Sense: 00 3a 00 00
 | SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA

with real resume it takes even longer time - but i dont see where the 
delays come from in that case - i suspect it's SATA.

i'm also getting this WARN_ON() from e1000:

BUG: at drivers/pci/msi.c:611 pci_enable_msi()
 [&lt;c01061bd&gt;] show_trace_log_lvl+0x19/0x2e
 [&lt;c01062b6&gt;] show_trace+0x12/0x14
 [&lt;c01062cc&gt;] dump_stack+0x14/0x16
 [&lt;c024fcc4&gt;] pci_enable_msi+0x6d/0x203
 [&lt;c02b709e&gt;] e1000_request_irq+0x2e/0xe2
 [&lt;c02bb742&gt;] e1000_resume+0x7f/0xef
 [&lt;c0249a68&gt;] pci_device_resume+0x1a/0x44
 [&lt;c02b39ec&gt;] resume_device+0xf7/0x16f
 [&lt;c02b3adb&gt;] dpm_resume+0x77/0xcb
 [&lt;c02b3b69&gt;] device_resume+0x3a/0x51
 [&lt;c014e669&gt;] enter_state+0x193/0x1bb
 [&lt;c014e712&gt;] state_store+0x81/0x97
 [&lt;c01b68bc&gt;] subsys_attr_store+0x20/0x25
 [&lt;c01b6feb&gt;] sysfs_write_file+0xce/0xf6
 [&lt;c017e16b&gt;] vfs_write+0xb1/0x13a
 [&lt;c017e899&gt;] sys_write+0x3d/0x61
 [&lt;c0105220&gt;] syscall_call+0x7/0xb

seems harmless because it seems to work fine.

	Ingo
-
To: Ingo Molnar <mingo@...>
Cc: Michael S. Tsirkin <mst@...>, Pavel Machek <pavel@...>, Jens Axboe <jens.axboe@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Jeff Garzik <jgarzik@...>, Auke Kok <auke-jan.h.kok@...>
Date: Tuesday, March 6, 2007 - 12:26 pm

SATA has another nice feature. Somehow there is an interrupt pending on
the SATA controller, which comes in somewhere in the middle of resume.
If it happens before the SATA code resumed, the SATA code ignores the
interrupt and the interrupt is disabled due to "nobody cared", which in
turn prevents SATA to ever become functional again.

Any idea on that one ?

	tglx


-
To: Thomas Gleixner <tglx@...>
Cc: Ingo Molnar <mingo@...>, Michael S. Tsirkin <mst@...>, Pavel Machek <pavel@...>, Jens Axboe <jens.axboe@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Jeff Garzik <jgarzik@...>, Auke Kok <auke-jan.h.kok@...>
Date: Tuesday, March 6, 2007 - 12:52 pm

Jeff - that sounds like a SATA bug.

If you have an interrupt handler registered, you'd better handle the 
interrupt regardless of whether you think the hardware might be gone or 
not.

It's generally *not* ok to do

	if (device_offline())
		return IRQ_NONE;

at the top of an interrupt handler. 

Of course, if you think the hardware is supposed to be quiescent, then the 
only thing you should do is generally just do the "shut up" operation (ie 
read status, write it back or whatever). You must generally *not* try to 
pass any data upwards (ie if the higher layers have told you to shut up, 
you may need to handle the hardware, but you must not involve the higher 
layers themselves any more, because they expect you to be quiet).

And if you cannot do that because you need to resume in order to have the 
status register mapped, then you need to have an "early_resume()" function 
which gets called *before* interrupts are enabled. That's what 
early-resume (and late-suspend) are designed for: doing things that happen 
very early in the resume sequence before everything is up.

And if you don't want to do any of these things (or are unable to, because 
of some ordering constraint or bad design), then you simply need to 

Jeff, Auke, does this ring any bells?

		Linus
-
To: Linus Torvalds <torvalds@...>
Cc: Thomas Gleixner <tglx@...>, Ingo Molnar <mingo@...>, Michael S. Tsirkin <mst@...>, Pavel Machek <pavel@...>, Jens Axboe <jens.axboe@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Jeff Garzik <jgarzik@...>
Date: Tuesday, March 6, 2007 - 1:09 pm

For the e1000 issue, the problem is solved with Eric Biederman's 3-patch msi 
cleanups. You should have another message in your mailbox confirming that I 
tested his patches and the MSI warning for e1000 suspend-resume is gone with them.

Cheers,

Auke
-
To: Michael S. Tsirkin <mst@...>
Cc: Pavel Machek <pavel@...>, Jens Axboe <jens.axboe@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Thomas Gleixner <tglx@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Jeff Garzik <jgarzik@...>, Auke Kok <auke-jan.h.kok@...>
Date: Tuesday, March 6, 2007 - 5:06 am

update: Thomas' PIT/HPET resume-fix patch fixed the delay for me.

	Ingo
-
To: Ingo Molnar <mingo@...>
Cc: Michael S. Tsirkin <mst@...>, Pavel Machek <pavel@...>, Jens Axboe <jens.axboe@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Thomas Gleixner <tglx@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Auke Kok <auke-jan.h.kok@...>
Date: Tuesday, March 6, 2007 - 1:30 am

I would poke Eric Biederman(sp?) about this one.  Maybe its even solved 
by the MSI-enable-related patch he posted in the past 24-48 hours.

	Jeff


-
To: Jeff Garzik <jeff@...>, Linus Torvalds <torvalds@...>
Cc: Ingo Molnar <mingo@...>, Michael S. Tsirkin <mst@...>, Pavel Machek <pavel@...>, Jens Axboe <jens.axboe@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Thomas Gleixner <tglx@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Eric W. Biederman <ebiederm@...>
Date: Tuesday, March 6, 2007 - 2:35 am

Eric, Linus,

I tried the 3-patch series "[PATCH 0/3] Basic msi bug fixes.." and they fix this 
problem for me. Were you expecting the OOPS in the first place? In any case, it 
survived several suspend/resume cycles on both enabled (irq alloc'd and enabled) 
and disabled devices (only initialized).

Jens Axboe was seeing the same problem, perhaps he can confirm the fix as well.

In any case, the patches have my blessing :)

Please add my:

   Signed-off-by: Auke Kok &lt;auke-jan.h.kok@intel.com&gt;


Cheers,

Auke
-
To: Kok, Auke <auke-jan.h.kok@...>
Cc: Jeff Garzik <jeff@...>, Linus Torvalds <torvalds@...>, Michael S. Tsirkin <mst@...>, Pavel Machek <pavel@...>, Jens Axboe <jens.axboe@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Thomas Gleixner <tglx@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Eric W. Biederman <ebiederm@...>
Date: Tuesday, March 6, 2007 - 5:04 am

the bug was the warning message (a WARN_ON()) above - not an oops. So 
that warning message is gone in your testing?

	Ingo
-
To: Ingo Molnar <mingo@...>
Cc: Kok, Auke <auke-jan.h.kok@...>, Jeff Garzik <jeff@...>, Linus Torvalds <torvalds@...>, Michael S. Tsirkin <mst@...>, Pavel Machek <pavel@...>, Jens Axboe <jens.axboe@...>, Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, Thomas Gleixner <tglx@...>, <linux-pm@...>, Michal Piotrowski <michal.k.k.piotrowski@...>, Eric W. Biederman <ebiederm@...>