Ok, the merge window for 2.6.21 has closed, and -rc1 is out there.
There's a lot of changes, as is usual for an -rc1 thing, but at least so
far it would seem that 2.6.20 has been a good base, and I don't think we
have anything *really* scary here.The most interesting core change may be the dyntick/nohz one, where timer
ticks will only happen when needed. It's been brewing for a _loong_ time,
but it's in the standard kernel now as an option.But there's a ton of architecture updates (arm, mips, powerpc, x86, you
name it), ACPI updates, and lots of driver work. And just a lot of
cleanups.Have fun,
Linus
-
I'm getting an undefined symbol with CONFIG_AGP=m:
WARNING: "compat_agp_ioctl" [drivers/char/agp/agpgart.ko] undefined!
Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
-
On Wed, Feb 21, 2007 at 07:34:01PM +0100, Andreas Schwab wrote:
> I'm getting an undefined symbol with CONFIG_AGP=m:
>
> WARNING: "compat_agp_ioctl" [drivers/char/agp/agpgart.ko] undefined!Fix went to Linus an hour ago.
It's been in -mm for a week, and agpgart.git for a day or so.Dave
This email lists some known regressions in 2.6.21-rc1 compared to 2.6.20
that are not yet fixed in Linus' tree.If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.Due to the huge amount of recipients, please trim the Cc when answering.
Subject : forcedeth no longer works
References : http://bugzilla.kernel.org/show_bug.cgi?id=8090
Submitter : David P. Reed <dpreed@reed.com>
Caused-By : Ayaz Abdulla <aabdulla@nvidia.com>
Status : unknownSubject : forcedeth: skb_over_panic
References : http://bugzilla.kernel.org/show_bug.cgi?id=8058
Submitter : Albert Hopkins <kernel@marduk.letterboxes.org>
Status : unknownSubject : natsemi ethernet card not detected correctly
References : http://lkml.org/lkml/2007/2/23/4
http://lkml.org/lkml/2007/2/23/7
Submitter : Bob Tracy <rct@gherkin.frus.com>
Caused-By : Mark Brown <broonie@sirena.org.uk>
Handled-By : Mark Brown <broonie@sirena.org.uk>
Patch : http://lkml.org/lkml/2007/2/23/142
Status : patch availableSubject : ThinkPad T60: system doesn't come out of suspend to RAM
(CONFIG_NO_HZ)
References : http://lkml.org/lkml/2007/2/22/391
Submitter : Michael S. Tsirkin <mst@mellanox.co.il>
Handled-By : Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar <mingo@elte.hu>
Status : unknownSubject : kernel BUG at kernel/time/tick-sched.c:168 (CONFIG_NO_HZ)
References : http://lkml.org/lkml/2007/2/16/346
Submitter : Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Handled-By : Thomas Gleixner <tglx@linutronix.de>
Status : problem is being debuggedSubject : BUG: soft lockup detected on CPU#0
NOHZ: local_softirq_pending 20 (SMT scheduler)
References : h...
Adrian,
The BUG_ON() was replaced by a warning printk(). The BUG_ON() exposed a
Patch available, not confirmed yet.
tglx
-
I can confirm that the bug is fixed (over 20 hours of testing should be enough).
Huge thanks!
Regards,
Michal--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
^^^^ almost ;)
Regards,
Michal--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
thanks alot! I think this thing was a long-term performance/latency
regression in HT scheduling as well.Ingo
-
Agreed.
I was recently looking at that spot because I found that niced tasks
were taking latency hits, and disabled it, which helped a bunch. I also
can't understand why it would be OK to interleave a normal task with an
RT task sometimes, but not others.. that's meaningless to the RT task.IMHO, SMT scheduling should be a buyer beware thing. Maximizing your
core utilization comes at a price, but so does disabling it, so I think
letting the user decide what he wants is the right thing to do.-Mike
-
Apologies for the resend, lkml address got mangled...
Ingo I'm going to have to partially disagree with you on this.
This has only become a problem because of what happens with dynticks now when
rq->curr == rq->idle. Prior to this, that particular SMT code only leads to
relative delays in scheduling for lower priority tasks. Whether or not that
task is ksoftirqd should not matter because it is not like they are starved
indefinitely, it is only that nice 19 tasks are relatively delayed, which by
definition is implied with the usage of nice as a scheduler hint wouldn't you
say? I know it has been discussed many times before as to whether 'nice'
means less cpu and/or more latency, but in our current implementation, nice
means both less cpu and more latency. So to me, the kernels without dynticks
do not have a regression. This seems to only be a problem in the setting of
the new dynticks code IMHO. That's not to say it isn't a bug! Nor am I saying
that dynticks is a problem! Please don't misinterpret that.The second issue is that this is a problem because of the fuzzy definition of
what idle is for a runqueue in the setting of this SMT code. Normally,
rq->curr==rq->idle means the runqueue is idle, but not with this code since
there are still rq->nr_running on that runqueue. What dynticks in this
implementation is doing is trying to idle a hyperthread sibling on a cpu
whose logical partner is busy. I did not find that added any power saving on
my earlier dynticks implementation, and found it easier to keep that sibling
ticking at the same rate as its partner. Of course you may have found
something different, and I definitely agree with what you are likely to say
in response to this- we shouldn't have to special case logical siblings as
having a different definition of idle than any other smp case. Ultimately,
that leaves us with your simple patch as a reasonable solution for the
dynticks case even though it does change the behaviour dramatically...
(hrmph. having to copy/paste/try again. evolution seems to be broken..
RCPT TO <linux-kernel@vg.kolivas.org> failed: Cannot resolve your domain {mp049}
..caused me to be unable to send despite receipts being disabled)No I'm not, but let's go further in that direction just for the sake of
argument. You're then saying that you prefer realtime priorities to not
work in the HT setting, given that realtime tasks don't participate in
the 'single stream me' program.I'm saying only that we're defeating the purpose of HT, and overriding a
So? User asked for HT. That's hardware multiplexing. It ain't free.
I don't think it does actually. Let your RT task sleep regularly, and
ever so briefly. We don't evict lower priority tasks from siblings uponTo me, the reason for interleaving is solely about keeping the core
Re-read this paragraph with realtime task priorities in mind, or for
that matter, dynamic priorities. If you carry your priority/throughput
argument to it's logical conclusion, only instruction streams of
absolutely equal priority should be able to share the core at any given
time. You may as well just disable HT and be done with it.To me, siblings are logically separate units, and should be treated as
such (as they mostly are). They share an important resource, but so do
physically discrete units.-Mike
-
Where do I say that? I do not presume to manage realtime priorities in any
way. You're turning my argument about nice levels around and somehow saying
that because hyperthreading breaks the single stream me semantics by
parallelising them that I would want to stop that happening. Nowhere have I
argued that realtime semantics should be changed to somehow work around
hyperthreading. SMT nice is about managing nice only, and not realtimeBut the buyer is not aware. You are aware because you tinker, but the vast
majority of users who enable hyperthreading in their shiny pcs are not aware.
The only thing they know is that if they enable hyperthreading their programs
run slower in multitasking environments no matter how much they nice the
other processes. Buyers do not buy hardware knowing that the internal design
breaks something as fundamental as 'nice'. You seem to presume that most
people who get hyperthreading are happy to compromise 'nice' in order to get
their second core working and I put it to you that they do not make thatWell you know as well as I do that you're selecting out the exception rather
Well that's certainly taking my logic for a ride. This is about managing
_nice_ and _only_ nice. Nice specifies fixed interval static priorities where
(in the risk of repeating myself) you are specifying that higher nice values
tasks you wish to receive less cpu and more latency. Dynamic priorities have
absolutely no effect on what the discrepancies are between the static
priorities of differing nice values. As for realtime priorities, again, I do
not presume to be managing them with SMT nice. They are unique entities
unrelated to nice values. The only thing they have in common with nice levels
is that if something is running without a realtime priority, it should be
preempted by the realtime task as you have specified that the realtime task
should receive all the cpu over the non-realtime task. I don't pretend that
there is some cpu percentage relations...
I see no real difference between the two assertions. Nice is just a
mechanism to set priority, so I applied your assertion to a different
range of priorities than nice covers, and returned it to show that the
code contradicts itself. It can't be bad for a nice 1 task to run with
a nice 0 task, but OK for a minimum RT task to run with a maximum RT
task. Iff HT without corrective measures breaks nice, then it breaksTo me it's pretty much black and white. Either you want to split your
cpu into logical units, which means each has less to offer than theI don't agree that it's the exception, and if you look at this HT thing
from the split cpu perspective, I'm not sure there's even a problem.Scrolling down, I see that this is getting too long, and we aren't
The above will have to do.
-Mike
-
i'm starting to lean towards your view that we should not artificially
keep tasks from running, when there's a free CPU available. We should
still keep the 'other half' of SMT scheduling: the immediate pushing of
tasks to a related core, but this bit of 'do not run tasks on this CPU'
dependent-sleeper logic is i think a bit fragile. Plus these days SMT
siblings do not tend to influence each other in such a negative way as
older P4 ones where a HT sibling would slow down the other sibling
significantly.plus with an increasing number of siblings (which seems like an
inevitable thing on the hardware side), the dependent-sleeper logic
becomes less and less scalable. We'd have to cross-check every other
'related' CPU's current priority to decide what to run.if then there should be a mechanism /in the hardware/ to set the
priority of a CPU - and then the hardware could decide how to prioritize
between siblings. Doing this in software is really hard.Ingo
-
Well it is meant to be tuned to the cpu type in per_cpu_gain. So it should be
easy to be set to the appropriate scaling. It was never meant to be a oneYes even I've commented before that this current system is unworkable come
multiple shared power threads. This I do see as a real problem with it - inAnd that's the depressing part because of course I was interested in that as
the original approach to the problem (and it was a big problem). When I spoke
to Intel and AMD (of course to date no SMT AMD chip exists) at kernel summit
they said it was too hard to implement hardware priorities well. Which is
real odd since IBM have already done it with Power...Still I think it has been working fine in software till now, but now it has to
deal with the added confusion of dynticks, so I already know what will happen
to it.Hrm it's been a good time for my code all round... I think I'll just swap
prefetch myself up the staircase to some pluggable scheduler that would
hyperthread me to sleep as an idle priority task.--
-ck
-
Well, it's not a dyntick problem in the first place. Even w/o dynticks
we go idle with local_softirq_pending(). Dynticks contains an explicit
check for that, which makes it visible.tglx
-
Oops I'm sorry if I made it sound like there's a dynticks problem. That was
not my intent and I said as much in an earlier email. Even though I'm finding
myself defending code that has already been softly tagged for redundancy,
let's be clear here; we're talking about at most a further 70ms delay in
scheduling a niced task in the presence of a nice 0 task, which is a
reasonable delay for ksoftirqd which we nice the eyeballs out of in mainline.
Considering under load our scheduler has been known to cause scheduling
delays of 10 seconds I still don't see this as a bug. Dynticks just "points
it out to us".--
-ck
-
well, not running softirqs when we could is a bug. It's not a big bug,
but it's a bug nevertheless. It doesnt matter that softirqs could be
delayed even worse under high load - there was no 'high load' here.Ingo
-
Gotcha. I'll prepare a smt-nice removal patch shortly.
--
-ck
-
Remove the SMT-nice feature which idles sibling cpus on SMT cpus to
facilitiate nice working properly where cpu power is shared. The idling
of cpus in the presence of runnable tasks is considered too fragile, easy
to break with outside code, and the complexity of managing this system if an
architecture comes along with many logical cores sharing cpu power will be
unworkable.Remove the associated per_cpu_gain variable in sched_domains used only by
this code.Signed-off-by: Con Kolivas <kernel@kolivas.org>
---
include/asm-i386/topology.h | 1
include/asm-ia64/topology.h | 2
include/asm-mips/mach-ip27/topology.h | 1
include/asm-powerpc/topology.h | 1
include/asm-x86_64/topology.h | 1
include/linux/sched.h | 1
include/linux/topology.h | 4
kernel/sched.c | 155 ----------------------------------
8 files changed, 1 insertion(+), 165 deletions(-)Index: linux-2.6.21-rc2/kernel/sched.c
===================================================================
--- linux-2.6.21-rc2.orig/kernel/sched.c 2007-03-02 08:56:45.000000000 +1100
+++ linux-2.6.21-rc2/kernel/sched.c 2007-03-02 08:58:40.000000000 +1100
@@ -3006,23 +3006,6 @@ static inline void idle_balance(int cpu,
}
#endif-static inline void wake_priority_sleeper(struct rq *rq)
-{
-#ifdef CONFIG_SCHED_SMT
- if (!rq->nr_running)
- return;
-
- spin_lock(&rq->lock);
- /*
- * If an SMT sibling task has been put to sleep for priority
- * reasons reschedule the idle task to see if it can now run.
- */
- if (rq->nr_running)
- resched_task(rq->idle);
- spin_unlock(&rq->lock);
-#endif
-}
-
DEFINE_PER_CPU(struct kernel_stat, kstat);EXPORT_PER_CPU_SYMBOL(kstat);
@@ -3239,10 +3222,7 @@ void scheduler_tick(void)update_cpu_clock(p, rq, now);
- if (p == rq->idle)
- /* Task on the idle queue */
- wake_priority_sleeper(rq);
- else
+ if (p != rq->idl...
Well, dyntick might end up to delay it for X seconds as well, which _is_
observable and that's why the check was put there in the first place.tglx
-
This email lists some known regressions in 2.6.21-rc1 compared to 2.6.20
that are not yet fixed in Linus' tree.If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.Due to the huge amount of recipients, please trim the Cc when answering.
Subject : resume: slab error in verify_redzone_free(): cache `size-512':
memory outside object was overwritten
References : http://lkml.org/lkml/2007/2/24/41
Submitter : Pavel Machek <pavel@ucw.cz>
Handled-By : Marcel Holtmann <marcel@holtmann.org>
Status : unknownSubject : ThinkPad T60: no screen after suspend to RAM
References : http://lkml.org/lkml/2007/2/22/391
Submitter : Michael S. Tsirkin <mst@mellanox.co.il>
Handled-By : Ingo Molnar <mingo@elte.hu>
Status : unknownSubject : HP nx6325 notebook: usb mouse stops working after suspend to ram
References : http://lkml.org/lkml/2007/2/21/413
Submitter : Arkadiusz Miskiewicz <arekm@maven.pl>
Caused-By : Konstantin Karasyov <konstantin.a.karasyov@intel.com>
commit 0a6139027f3986162233adc17285151e78b39cac
Status : unknownSubject : ACPI update breaks kpowersave
References : http://lkml.org/lkml/2007/2/10/7
Submitter : Ismail Dönmez <ismail@pardus.org.tr>
Fabio Comolli <fabio.comolli@gmail.com>
Status : unknownSubject : MacBook: AE_NOT_FOUND ACPI messages
References : http://bugzilla.kernel.org/show_bug.cgi?id=8066
Submitter : Thomas Meyer <thomas.mey@web.de>
Status : unknownSubject : Asus A8N-VM motherboard:
framebuffer/console boot failure boot failure (ACPI related)
References : http://lkml.org/lkml/2007/2/23/132
Submitter : Andrew Nelless <andrew@nelless.net>
Handled-By : Antonino A. Daplas <...
Still appears, but this does not seem to be 40/80 pin cable problem to
be but rather ata-piix calling some acpi methods and this rulsts in acpi
errors.--
Meelis Roos (mroos@linux.ee)
-
On Tue, 27 Feb 2007 15:00:29 +0200 (EET)
There are two separate problems showing up in the one trace - broken ACPI
spew and wrong cable detect. I don't think they are related
-
Just reproduced this in -rc2.
Another thing I noticed:
with 2.6.20, pressing Fn/F4 generates an ACPI event and triggers suspend to RAM.On 2.6.21-rc2, after resume (when the box is accessible from network),
pressing Fn/F4 again does not seem to have any effect.--
MST
-
I have the same problem on my IBM X60s on rc1 and rc2. Can't resume
from RAM, can't suspend to disk. It is possible to revert all the
changes to ACPI and test it?Jeff.
-
cu
Adrian--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed-
Yes.
-
I've tried with CONFIG_KVM=n and CONFIG_KVM=y and both does not suspend.
-
Do you mean that they "do not resume after suspend"?
--
MST
-
I can't even suspend to disk/ram. It just hangs and the lights just
blink and everything else hangs. With 2.6.20, it works fine.Jeff.
-
Turn up console loglevel, and see where it hangs...
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
I think CONFIG_DISABLE_CONSOLE_SUSPEND would have to be set for this purpose
too.Greetings,
Rafael
-
As I said elsewhere in the thread, suspend/resume to RAM works ok on
my thinkpad x60. I posted my .config there, perhaps difference is in
it? Ingo identified KVM as possible culprit.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
I'll try your .config for kicks, the problem that Ingo pin pointed is
not what is affecting me.--
Jens Axboe-
Can you please get the dmesg output after resume via the network ?
tglx
-
The link above has it.
--
MST
-
This email lists some known regressions in 2.6.21-rc1 compared to 2.6.20
that are not yet fixed in Linus' tree.If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.Due to the huge amount of recipients, please trim the Cc when answering.
Subject : forcedeth: skb_over_panic
References : http://bugzilla.kernel.org/show_bug.cgi?id=8058
Submitter : Albert Hopkins <kernel@marduk.letterboxes.org>
Status : unknownSubject : natsemi ethernet card not detected correctly
References : http://lkml.org/lkml/2007/2/23/4
http://lkml.org/lkml/2007/2/23/7
Submitter : Bob Tracy <rct@gherkin.frus.com>
Caused-By : Mark Brown <broonie@sirena.org.uk>
Handled-By : Mark Brown <broonie@sirena.org.uk>
Patch : http://lkml.org/lkml/2007/2/23/142
Status : patch availableSubject : request_module: runaway loop modprobe net-pf-1
References : http://lkml.org/lkml/2007/2/21/206
Submitter : YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
Caused-By : Kay Sievers <kay.sievers@vrfy.org>
commit c353c3fb0700a3c17ea2b0237710a184232ccd7f
Handled-By : Greg KH <greg@kroah.com>
Status : problem is being discussedSubject : IPV6=m, SUNRPC=y compile error
References : http://bugzilla.kernel.org/show_bug.cgi?id=8050
http://lkml.org/lkml/2007/2/12/442
http://lkml.org/lkml/2007/2/20/384
Submitter : Michael-Luke Jones <mlj28@cam.ac.uk>
Pete Clements <clem@clem.clem-digital.net>
Sid Boyce <g3vbv@blueyonder.co.uk>
Caused-By : Chuck Lever <chuck.lever@oracle.com>
Handled-By : YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
Status : patch availableSubject : WARNING: "compat_agp_ioctl" undefin...
Patch has been reverted and submitted to Linus to pull, but he's out of
town right now...thanks,
greg k-h
-
This email lists some known regressions in 2.6.21-rc1 compared to 2.6.20
that are not yet fixed in Linus' tree.If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.Due to the huge amount of recipients, please trim the Cc when answering.
Subject : ThinkPad T60: system doesn't come out of suspend to RAM
(CONFIG_NO_HZ)
References : http://lkml.org/lkml/2007/2/22/391
Submitter : Michael S. Tsirkin <mst@mellanox.co.il>
Thomas Gleixner <tglx@linutronix.de>
Handled-By : Ingo Molnar <mingo@elte.hu>
Status : unknownSubject : kernel BUG at kernel/time/tick-sched.c:168 (CONFIG_NO_HZ)
References : http://lkml.org/lkml/2007/2/16/346
Submitter : Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Handled-By : Thomas Gleixner <tglx@linutronix.de>
Status : problem is being debuggedSubject : BUG: soft lockup detected on CPU#0
NOHZ: local_softirq_pending 20
References : http://lkml.org/lkml/2007/2/20/257
Submitter : Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Handled-By : Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar <mingo@elte.hu>
Status : problem is being debuggedSubject : i386: no boot with nmi_watchdog=1 (clockevents)
References : http://lkml.org/lkml/2007/2/21/208
Submitter : Daniel Walker <dwalker@mvista.com>
Caused-By : Thomas Gleixner <tglx@linutronix.de>
commit e9e2cdb412412326c4827fc78ba27f410d837e6e
Handled-By : Thomas Gleixner <tglx@linutronix.de>
Status : problem is being debugged-
x60 doesn't resume from S2R either, it doesn't matter if CONFIG_NO_HZ is
set or not though. 2.6.20 worked fine.--
Jens Axboe-
Is this
Subject : ThinkPad T60: no screen after suspend to RAM
References : http://lkml.org/lkml/2007/2/22/391
Submitter : Michael S. Tsirkin <mst@mellanox.co.il>
Ingo Molnar <mingo@elte.hu>
Handled-By : Ingo Molnar <mingo@elte.hu>
Status : unknowncu
Adrian--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed-
It doesn't resume at all.
--
Jens Axboe-
It somehow works for me. As long as I do not play with bluetooth and
suspend to disk...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
It locks solid here on resume, going back to 2.6.20 makes it work
perfectly again. In between 2.6.20 and 2.6.21-rc1 some ACPI change broke
resume, but that got fixed. Some other change later snuck in that broke
it AGAIN for me, sigh.I don't use bluetooth nor suspend to disk.
--
Jens Axboe-
resume is stock on my T60 too. So you mean v2.6.21-rc1 vanilla works
fine? Do you know a commit ID that works for sure? I'd like to bisect
this, but this way i might just find that ACPI change that got already
fixed later on (and then got re-broken).Ingo
-
Nope, 2.6.21-rc1 vanilla does not work. 2.6.20 works. 2.6.20-gitX worked
until some acpi change broke it, the below patch fixed that for me. That
got merged in a later 2.6.20-gitY, but then some other patch broke it
again so that 2.6.21-rc1 is broken. Not much luck there :-)So it looks like:
- c5a7156959e89b32260ad6072bbf5077bcdfbeee broke 2.6.20-git
- f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38 should fix that.
- Something later than f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38 broke itYeah, it gets trickier. I'll try
f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38 now and see if that works, then
bisect to 2.6.21-rc1 to find the other offender. I hope the other
offender didn't get added before
f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38, we'll see :-)--
Jens Axboe-
f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38 works for me, starting bisect.
--
Jens Axboe-
update: f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38 works for me too, and
01363220f5d23ef68276db8974e46a502e43d01d is broken. I too will attempt
to bisect this.Ingo
-
f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38 works for me, too.
--
MST
-
Strange; on my x60, suspend to ram works okay.
(Resume is very slow, because disks are not spinned up properly; and
there's something wrong with timers; console beeps take way too long).dmesg attached.
That's with
commit 7b965e0884cee430ffe5dc81cdb117b9316b0549
tree 754dce6432258e0a8c3a758e13a34eb3a1d22ee1
parent 5a39e8c6d655b4fe8305ef8cc2d9bbe782bfee5f
author Trond Myklebust <Trond.Myklebust@netapp.com> Wed, 28 Feb 2007
20:13:55 -0800
committer Linus Torvalds <torvalds@woody.linux-foundation.org> Thu, 01
Mar 2007 14:53:39 -0800[PATCH] VM: invalidate_inode_pages2_range() should not exit early
Fix invalidate_inode_pages2_range() so that it does not
immediately exit
just because a single page in the specified range could not be
removed.Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Pavel, I tried with your .config, and indeed the system came back to life after
2-3 minutes after I press Fn/F4, indeed the issue seems to be with the disk.
It could be that the same takes place with my original .config - maybe
I just wasn't patient enough. I'll need to re-test that.However, I noticed that, after resume, when the system is presumably functional,
if I try to suspend to ram again, this second suspend hangs, displaying
the following on screen:[ 17.170000] ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 20
[ 17.170000] PCI: Setting latency timer of device 0000:02:00.0 to 64
[ 17.250000] e1000: 0000:02:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x1) 00:16:41:5
4:6c:47
[ 17.330000] e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connectionthe crescent LED starts blinking and does not seem to stop for at lest 10 min,
I've run out of patience after that. It could be that it's just very slow again.Pavel, did you try suspend to RAM after a successfull resume from RAM?
Under 2.6.20, the system suspends/resumes to memory within about 20 sec
any number of times.--
MST
-
Seems to work ok in -rc3... as long as I do not mix s2ram with s2disk.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
the spin-up takes a few seconds here under suspend/resume simulation:
| ata1: waiting for device to spin up (7 secs)
| Restarting tasks ... done.[5-10 seconds pass]
| ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
| ata1.00: configured for UDMA/100
| SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
| sda: Write Protect is off
| sda: Mode Sense: 00 3a 00 00
| SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUAwith real resume it takes even longer time - but i dont see where the
delays come from in that case - i suspect it's SATA.i'm also getting this WARN_ON() from e1000:
BUG: at drivers/pci/msi.c:611 pci_enable_msi()
[<c01061bd>] show_trace_log_lvl+0x19/0x2e
[<c01062b6>] show_trace+0x12/0x14
[<c01062cc>] dump_stack+0x14/0x16
[<c024fcc4>] pci_enable_msi+0x6d/0x203
[<c02b709e>] e1000_request_irq+0x2e/0xe2
[<c02bb742>] e1000_resume+0x7f/0xef
[<c0249a68>] pci_device_resume+0x1a/0x44
[<c02b39ec>] resume_device+0xf7/0x16f
[<c02b3adb>] dpm_resume+0x77/0xcb
[<c02b3b69>] device_resume+0x3a/0x51
[<c014e669>] enter_state+0x193/0x1bb
[<c014e712>] state_store+0x81/0x97
[<c01b68bc>] subsys_attr_store+0x20/0x25
[<c01b6feb>] sysfs_write_file+0xce/0xf6
[<c017e16b>] vfs_write+0xb1/0x13a
[<c017e899>] sys_write+0x3d/0x61
[<c0105220>] syscall_call+0x7/0xbseems harmless because it seems to work fine.
Ingo
-
SATA has another nice feature. Somehow there is an interrupt pending on
the SATA controller, which comes in somewhere in the middle of resume.
If it happens before the SATA code resumed, the SATA code ignores the
interrupt and the interrupt is disabled due to "nobody cared", which in
turn prevents SATA to ever become functional again.Any idea on that one ?
tglx
-
Jeff - that sounds like a SATA bug.
If you have an interrupt handler registered, you'd better handle the
interrupt regardless of whether you think the hardware might be gone or
not.It's generally *not* ok to do
if (device_offline())
return IRQ_NONE;at the top of an interrupt handler.
Of course, if you think the hardware is supposed to be quiescent, then the
only thing you should do is generally just do the "shut up" operation (ie
read status, write it back or whatever). You must generally *not* try to
pass any data upwards (ie if the higher layers have told you to shut up,
you may need to handle the hardware, but you must not involve the higher
layers themselves any more, because they expect you to be quiet).And if you cannot do that because you need to resume in order to have the
status register mapped, then you need to have an "early_resume()" function
which gets called *before* interrupts are enabled. That's what
early-resume (and late-suspend) are designed for: doing things that happen
very early in the resume sequence before everything is up.And if you don't want to do any of these things (or are unable to, because
of some ordering constraint or bad design), then you simply need toJeff, Auke, does this ring any bells?
Linus
-
For the e1000 issue, the problem is solved with Eric Biederman's 3-patch msi
cleanups. You should have another message in your mailbox confirming that I
tested his patches and the MSI warning for e1000 suspend-resume is gone with them.Cheers,
Auke
-
update: Thomas' PIT/HPET resume-fix patch fixed the delay for me.
Ingo
-
I would poke Eric Biederman(sp?) about this one. Maybe its even solved
by the MSI-enable-related patch he posted in the past 24-48 hours.Jeff
-
Eric, Linus,
I tried the 3-patch series "[PATCH 0/3] Basic msi bug fixes.." and they fix this
problem for me. Were you expecting the OOPS in the first place? In any case, it
survived several suspend/resume cycles on both enabled (irq alloc'd and enabled)
and disabled devices (only initialized).Jens Axboe was seeing the same problem, perhaps he can confirm the fix as well.
In any case, the patches have my blessing :)
Please add my:
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Cheers,
Auke
-
the bug was the warning message (a WARN_ON()) above - not an oops. So
that warning message is gone in your testing?Ingo
-
yes.
Auke
-
Sorry for the slow delay. I was out of town for my brothers wedding the last few
days.I wasn't exactly expecting the WARN_ON to trigger. What I fixed was
an inconsistency in handling our state bits. Fixing that
inconsistency appears to have fixed the e1000 usage scenario mostly by
accident.The basic issue is that pci_save_state saves the current msi state
along with other registers, and then the e1000 driver goes and
disables the msi irq after we have saved the irq state as on.My code notices that the msi irq was disabled before restore time, so
it skips the restore. However we now have a leak of the msi saved cap
because we are not freeing it.This leaves with some basic questions.
- Does it make sense for suspend/resume methods to request/free irqs?
- Does it make sense for suspend/resume methods to allocate/free msi irqs?
- Do we want pci_save/restore_cap to save/restore msi state?The path of least resistance is to just free the extra state and we
are good. I'm just not quite certain that is sane and it has been a
long day.Eric
-
we used to have a lengthy e1000_pci_save|restore_state in our code, which is now
gone, so I'm all for that. A separate pci_save_pxie|msi(x)_state for every
driver seems completely unnecessary. I can't think of a use case where
saving+restoring everything hurts. That's what you want I presume.We currently free all irq's and msi before going into suspend in e1000, and I
think that is probably a good thing, somehow I can think of bad things happening
if we dont, but I admit that I haven't tried it without alloc/free. We do this
in e100 as well and it works.Another motivation would be to leave this up to the driver: if the driver
chooses to free/alloc interrupts because it makes sense, you probably would want
to keep that choice available. Devices that don't need this can skip the
alloc/free, but leave the choice open for others.hth
Auke
-
ah, looking at the code in e1000 we do:
_suspend:
pci_save_state();
free_irq()_resume:
pci_restore_state();
alloc_irq();I suppose that's not good either, and the major cause of the warning in the
first place.Maybe I can rollback your latest patches and try to fix that mess by postponing
the pci_save_state until after we free'd the irq's.Auke
-
I just want to understand why we have issues and to see if how we have
organized the suspend/resume path for dealing with msi irqs makes sense.That is I haven't looked much at the suspend/resume path so I don't know it
well and I am afraid that your problem might be a symptom of a deeperCurrently the irq code supports operation without the
free_irq/request_irq. Since the numbers given are pure linux
abstractions things should it is really a matter of justBelow is an additional set of warnings that should help debug this.
The old code just got lucky that it triggered a warning when this happens.diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 01869b1..5113913 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -613,6 +613,7 @@ int pci_enable_msi(struct pci_dev* dev)
return -EINVAL;WARN_ON(!!dev->msi_enabled);
+ WARN_ON(!hlist_empty(&dev->saved_cap_space));/* Check whether driver already requested for MSI-X irqs */
if (dev->msix_enabled) {
@@ -638,6 +639,8 @@ void pci_disable_msi(struct pci_dev* dev)
if (!dev->msi_enabled)
return;+ WARN_ON(!hlist_empty(&dev->saved_cap_space));
+
msi_set_enable(dev, 0);
pci_intx(dev, 1); /* enable intx */
dev->msi_enabled = 0;
@@ -739,6 +742,7 @@ int pci_enable_msix(struct pci_dev* dev, struct msix_entry *entries, int nvec)
}
}
WARN_ON(!!dev->msix_enabled);
+ WARN_ON(!hlist_empty(&dev->saved_cap_space));/* Check whether driver already requested for MSI irq */
if (dev->msi_enabled) {
@@ -763,6 +767,8 @@ void pci_disable_msix(struct pci_dev* dev)
if (!dev->msix_enabled)
return;+ WARN_ON(!hlist_empty(&dev->saved_cap_space));
+
msix_set_enable(dev, 0);
pci_intx(dev, 1); /* enable intx */
dev->msix_enabled = 0;
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index bd44a48..4418839 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -677,6 +677,7 @@ pci_restore_state(struct pci_dev *dev)...
Eric W. Biederman wrote:
I'm trying this patch together with the other 2 that you sent out a few days
ago. I'm seeing some minor issues with this and lots of bogus warnings as far as
I can see.If I suspend/resume and unload e1000, then reinsert e1000.ko, I immediately hit
the WARN_ON at `msi.c:516: WARN_ON(!hlist_empty(&dev->saved_cap_space));`I'm not sure that's useful debugging information. even though saved state exists
the module has been removed, so you might want to purge the state table when the
driver gets removed?anyway, back to testing.
-
Sorry I should have been clear.
With the last two patches I sent out:
pci: Repair pci_save/restore_state so we can restore one save many times.
msi: Safer state caching.Those WARN_ON's are now totally bogus.
In essence the WARN_ON's were testing to ensure pci_save_state and
pci_restore_state were paired.The assumptions was that pci_restore_state would remove everything
from the list that pci_save_state placed on it.When I reworked the code and removed the bogus pairing requirement it meant
that if we ever save state and have a pci-e or pci-x capability we will have
a state structure on the list until the kernel reboots.In summary:
pci_save_state and pci_restore_state were making a wrong assumption
about the world. The WARN_ON patch tested to ensure the world matched
those functions. When I fixed those functions to match the world the
WARN_ON's became completely bogus.Eric
-
Got a hit on a powerpc g5:
PM: Writing back config space on device 0001:05:04.0 at offset 1 (was 2b00000, writing 2b00006)
------------[ cut here ]------------
Badness at drivers/pci/pci.c:679
Call Trace:
[C0000000080F7410] [C000000000011EFC] .show_stack+0x50/0x1cc (unreliable)
[C0000000080F74C0] [C0000000001AD610] .report_bug+0xa0/0x110
[C0000000080F7550] [C0000000000256E4] .program_check_exception+0xb4/0x670
[C0000000080F7630] [C0000000000046F4] program_check_common+0xf4/0x100
--- Exception: 700 at .pci_restore_state+0x310/0x340
LR = .pci_restore_state+0x2e0/0x340
[C0000000080F79D0] [C00000000026A174] .tg3_chip_reset+0x19c/0xa04
[C0000000080F7A90] [C00000000026D948] .tg3_reset_hw+0xa4/0x2718
[C0000000080F7BA0] [C000000000270030] .tg3_init_hw+0x74/0x94
[C0000000080F7C30] [C000000000270BE0] .tg3_open+0x4c8/0x854
[C0000000080F7CF0] [C0000000003A74A4] .dev_open+0x100/0x12c
[C0000000080F7D90] [C0000000003BAEA8] .netpoll_setup+0x2dc/0x3ec
[C0000000080F7E40] [C000000000283450] .init_netconsole+0x64/0x8c
[C0000000080F7EC0] [C0000000005C0BE4] .init+0x1d0/0x390
[C0000000080F7F90] [C0000000000271F8] .kernel_thread+0x4c/0x68
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.
Scheduler bitmap error - bitmap being reconstructed..
netconsole: network logging started
Calling initcall 0xc0000000006bd180: .macio_module_init+0x0/0x3c()That's:
pci_restore_pcix_state(dev);
pci_restore_msi_state(dev);
WARN_ON(!hlist_empty(&dev->saved_cap_space));return 0;
-
Hmm. Either I am confused of I just found an unanticipated leak.
pci_restore_msi_state should be out of the picture as we don't yet
have ppc msi support and I don't think the g5 generation hardware
supported it either.The only case I can see which might trigger this is if we saved
pci-X state and then didn't restore it because we could not find
the capability on restore.Any chance you could walk that list and find the cap_nr of the remaining
element?Something like:
{
struct pci_cap_saved_state *tmp;
struct hlist_node *pos;hlist_for_each_entry(tmp, pos, &pci_dev->saved_cap_space, next)
printk(KERN_INFO "saved_cap: 0x%02x\n", tmp->cap_nr);
}Until I get the best scenario I can come up with is a tg3 hardware bug
that doesn't renable the pci-X capability after a restore of power state.Getting that cap_nr will at least allow me to be certain if I am dealing
with msi, pci-X or pci-e.Unanticipated bugs aren't supposed to be this easy to find!
Eric
-
Hmm. pci_save_pcix_state/pci_restore_pcix_state seem to only handle
regular devices and seem to ignore the fact that for bridge PCI-X
capability has a different structure.Is this intentional? If not, here's a patch to fix this.
Warning: completely untested.PCI: restore bridge PCI-X capability registers after PM event
Restore PCI-X bridge up/downstream capability registers
after PM event. This includes maxumum split transaction
commitment limit which might be vital for PCI X.Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index df49530..4b788ef 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -597,14 +597,19 @@ static int pci_save_pcix_state(struct pci_dev *dev)
if (pos <= 0)
return 0;- save_state = kzalloc(sizeof(*save_state) + sizeof(u16), GFP_KERNEL);
+ save_state = kzalloc(sizeof(*save_state) + sizeof(u16) * 2, GFP_KERNEL);
if (!save_state) {
- dev_err(&dev->dev, "Out of memory in pci_save_pcie_state\n");
+ dev_err(&dev->dev, "Out of memory in pci_save_pcix_state\n");
return -ENOMEM;
}
cap = (u16 *)&save_state->data[0];- pci_read_config_word(dev, pos + PCI_X_CMD, &cap[i++]);
+ if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+ pci_read_config_word(dev, pos + PCI_X_BRIDGE_UP_SPL_CTL, &cap[i++]);
+ pci_read_config_word(dev, pos + PCI_X_BRIDGE_DN_SPL_CTL, &cap[i++]);
+ } else
+ pci_read_config_word(dev, pos + PCI_X_CMD, &cap[i++]);
+
pci_add_saved_cap(dev, save_state);
return 0;
}
@@ -621,7 +626,11 @@ static void pci_restore_pcix_state(struct pci_dev *dev)
return;
cap = (u16 *)&save_state->data[0];- pci_write_config_word(dev, pos + PCI_X_CMD, cap[i++]);
+ if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+ pci_write_config_word(dev, pos + PCI_X_BRIDGE_UP_SPL_CTL, cap[i++]);
+ pci_write_config_word(dev, pos + PCI_X_BRIDGE_DN_SPL_CTL, cap[i++]);
+ } else
+ pci_write_config_word(dev, pos + PCI_...
Probably not a such. I don't think we have any drivers for bridge
devices so I don't think it matters. It likely wouldn't hurt to fix
it just in case though.If you fix the offsets and diff this against my last fix (to never
Unless I am completely misreading the spec. While you have picked the
right register to save the offsets should be 0x08 and 0x0c or 8 and 12....Eric
-
Yes but they do their own thing wrt saving/restoring registers.
No, the spec is written in terms of dwords (32 bit), we are storing words (16 bits).
The data at offsets 8 and 12 is read-only split transaction capacity.
Split transaction limit starts at bit 16 so you need to add 2 to byte offset.Right?
--
MST
-
So, Eric, what are your thoughts on this patch (probably wrt 2.6.22)?
Since the rest of the save/restore code uses 16 bit transactions,
it should be safe here too?--
MST
-
thinking the code works.
The read-only and the read-write part are all defined as part of the
same register so I didn't expect them to be separate. And I hadn't
paid attention enough to see that the code was only saving 16bit
values.Rumor has it that some pci devices can't tolerate < 32bit accesses.
Although I have never met one. The two factors together suggest that
for generic code it probably makes sense to operate on 32bit
quantities, and just to ignore the read-only portion.Eric
-
The code for regular devices seems to use 16-bit accesses, so
I think it's best to stay consistent. Or do you want to change this too?--
MST
-
If we are stomping rare probabilities we might as well change that too.
The code to save pci-x state is relatively recent. So it probably just
hasn't met a problem device yet (assuming they exist).Eric
-
OK I guess. I gather we assume writing read-only registers has no side effects?
Are there rumors circulating wrt to these?--
MST
-
I haven't heard anything about that, and if we are writing the same value back
it should be pretty safe.I have heard it asserted that at least one version of the pci spec
only required 32bit accesses to be supported by the hardware. One of
these days I will have to look that and see if it is true. I do know
it can be weird for hardware developers to support multiple kinds of
decode. As I recall for pci and pci-x at the hardware level the only
difference in between 32bit transactions and smaller ones is the state
of the byte-enable lines.Eric
-
Is this the only place where Linux uses pci_read_config_word/pci_read_config_dword?
I think such hardware will be pretty much DOA on all OS-es. Why don't we waitTrue, and same holds for PCI-Express.
So let's assume hardware implements RO correctly but ignores the BE bits -
nothing bad happens then, right?--
MST
-
Speaking of tg3, make sure you are aware that the number of calls to
save-state functions may not match the number of calls to the
restore-state functions. ISTR seeing some memleak bugs in PCI related
to this.Jeff
-
Thanks that looks like the problem, multiple calls to save before one
call to restore when you have a pci-x capability would easily trigger
this problem. I just surveyed a bunch of the pci_save_state and
pci_restore_state users and this appears to be a common idiom not just
a tg3 thing....It looks like when code was added to save/restore the msi capability
was added to pci_save/restore_state that an assumption was added that
pci_save_state and pci_restore state were only used for suspend and
only used in pairs. There is even a partial bug fix that removed the
worst of the symptoms of that assumption from the msi code but failed
to recognize the core problem.Now that we have code to work with pcie and pcix capabilities as well
as msi this problem is much easier to hit.All of pci_save_state and pci_restore_state is going to have to be
restructured to fix this, and it is late in the game. Ugh.
Oh well, better to fix it nowAt least I get my answer about if what pci_save_state is doing
is reasonable. It is not. pci_save_state no longer supports being
used in conjunction with hardware reset and has become a
suspend/resume specific function.Now I'm off to wite some patches to fix this.
Eric
-
Well this is clearly a weird tangent from the topic of this thread but
it looks to have found some real bugs even if they aren't the ones we
are looking for.In short pci_save_state and pci_restore_state are used to two primary
was: As a pair called from the suspend and restore routines. One save
to multiple restores usually present in hardware reset routines.The additions to save/restore msi, pci-e and pci-x state failed to take
this second usage scenario into account.The next two patches address this problem.
This should directly fix the issue observed with the tg3, with multiple
saves and a single restore.It happens that to handle the reset case cleanly we also need to support
the odd usage model of the current e1000 driver.I have never have figured out how to get suspend/resume actually
working on any of my machines so the code path is untested. But the
patches are trivial and pretty much obviously correct.So if a couple people could do a quick suspend/resume test on these
patches to confirm I'm not blind that would be great.Eric
-
ouch! Are you interested in getting it work?
Ingo
-
I haven't even seriously tried. I don't yet have a laptop and I like
staying logged. The first time I rolled the jiffie counter it was a
happened after I had realized that if I kept my machine up for another
couple of months my jiffie counter would roll and I said why not So
basically suspend hasn't been a feature I'm been interested in using.
I get more annoyed at my UPS failing..I will probably get around to figuring out the whole suspend/resume
thing one of these days and it will probably take me a day or two.
But that doesn't result in bug fixes being delivered in a timely manner.Eric
-
There are two ways pci_save_state and pci_restore_state are used. As
helper functions during suspend/resume, and as helper functions around
a hardware reset event. When used as helper functions around a hardware
reset event there is no reason to believe the calls will be paired, nor
is there a good reason to believe that if we restore the msi state from
before the reset that it will match the current msi state. Since arch
code may change the msi message without going through the driver, drivers
currently do not have enough information to even know when to call
pci_save_state to ensure they will have msi state in sync with the other
kernel irq reception data structures.It turns out the solution is straight forward, cache the state in the
existing msi data structures (not the magic pci saved things) and
have the msi code update the cached state each time we write to the hardware.
This means we never need to read the hardware to figure out what the hardware
state should be.By modifying the caching in this manner we get to remove our save_state
routines and only need to provide restore_state routines.The only fields that were at all tricky to regenerate were the msi and msi-x
control registers and the way we regenerate them currently is a bit dependent
upon assumptions on how we use the allow msi registers to be configured and used
making the code a little bit brittle. If we ever change what cases we allow
or how we configure the msi bits we can address the fragility then.Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
drivers/pci/msi.c | 150 ++++++++--------------------------------------
drivers/pci/pci.c | 2 -
drivers/pci/pci.h | 2 -
include/linux/msi.h | 8 +--
include/linux/pci_regs.h | 1 +
5 files changed, 29 insertions(+), 134 deletions(-)diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 01869b1..ad33e01 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -100,6 +100,7 @@ static voi...
This is a note to let you know that I've just added the patch titled
Subject: msi: Safer state caching.
to my gregkh-2.6 tree. Its filename is
msi-safer-state-caching.patch
This tree can be found at
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/
hm. There's some weird bisection artifact here. Here are the commits i
tested, in git-log order:#1 commit 01363220f5d23ef68276db8974e46a502e43d01d bad
#2 commit ee404566f97f9254433399fbbcfa05390c7c55f7 bad
#3 commit f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38 good
#4 commit c827ba4cb49a30ce581201fd0ba2be77cde412c7 badif i tell git-bisect that #1 is bad and #3 is good, then it offers me #2
- that's OK. But when i tell it that #2 is bad, it offers #4 - which is
out of order! The bisection goes off into la-la land after that and
never gets back to a commit that is /after/ the good commit. How is this
possible? (I upgraded from git-1.4.4 to 1.5.0 to make sure this isnt
some git bug that's already fixed.)i'll try to straighten this out manually, perhaps #3 is in some merge
branch that confuses bisection. Or maybe i misunderstood how git-bisect
works.Ingo
-
No it's not. "git bisect" does exactly the right thing. There is no simple
ordering in a complex branch-merge schenario, you can't just put the
commits in some "ordering" and test things in time order. That would be
totally broken, and idiotic. It doesn't give the right results.What git bisect does is to find the commit that most closely *bisects* the
history of commits, so that if it is marked good/bad, it will leave you
with about 50% of the commits left. But if you are looking at date order,
you're entirely confused.For example, let's take a really simple case
a <- bad
/ \
b c
| |
d e
| |
f g
\ /
h
|
* <-goodand if you are looking to find something "in the middle", you might thing
that "d" or "e" are the best choices, since time-wise, they are in the
middle.But that's not true AT ALL.
If you actually want to bisect that kind of history, you need to choose
"b" or "c", even though they may both be *much* more "recent" than the
others. Why? Because if you pick "d", you're really only testing three
commits ('d' 'f' and 'h') out of the 8 commits you have to test.In contrast, if you pick 'b', you are testing the effects of *four*
commits ('b', 'd', 'f' and 'h') and you have thus neatly bisected the
commits into two equal groups for testing (one group _with_ those four
commits, and one group _without_) instead of having partitioned them as 3
commits vs 5 commits.So please realize that non-linear history very much means that you MUST
NOT think that you just pick a commit "in the middle". No, git bisect is a
LOT smarter than that - it picks a commit that *reaches* about half theIt's possible because git knows what it is doing, and you didn't think
things through.The commits that "git bisect" picked out are the right ones. Quite often,
there may be two or more "equally good" commits (in my example above, you
can choose either "b" or "c", and it will bis...
git-bisect gets royally confused on those ACPI merge branches around
commit c0cd79d11412969b6b8fa1624cdc1277db82e2fe. Here are my test
results so far:commit 01363220f5d23ef68276db8974e46a502e43d01d: bad
commit 255f0385c8e0d6b9005c0e09fffb5bd852f3b506: bad
commit c0cd79d11412969b6b8fa1624cdc1277db82e2fe: bad
commit c24e912b61b1ab2301c59777134194066b06465c: good
commit e9e2cdb412412326c4827fc78ba27f410d837e6e: bad
commit 79bf2bb335b85db25d27421c798595a2fa2a0e82: bad
commit fc955f670c0a66aca965605dae797e747b2bef7d: good
commit 70c0846e430881967776582e13aefb81407919f1: good
commit 414f827c46973ba39320cfb43feb55a0eeb9b4e8: bad
commit f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38: good
commit 5f0b1437e0708772b6fecae5900c01c3b5f9b512: bad
commit b878ca5d37953ad1c4578b225a13a3c3e7e743b7: bad
commit c2902c8ae06762d941fab64198467f78cab6f8cd: bad
commit 12e74f7d430655f541b85018ea62bcd669094bd7: bad
commit 3388c37e04ec0e35ebc1b4c732fdefc9ea938f3b: bad
commit 9f4bd5dde81b5cb94e4f52f2f05825aa0422f1ff: badthe results are totally reproducible (i re-tried a few of both the good
and the bad commits), i.e. it's not a sporadic condition. Also, a number
of the 'bad' commits have no dynticks stuff in them at all, so i'd
exclude dynticks.could someone suggest a sane way to go with this? Perhaps suggest
specific commit IDs to test?Ingo
-
Looks like git bisect worked for you, and wasn't confused at all. You
started out with 2931 commits between your first known-bad and known-good
commits, which means that you usually end up having to check "log2(n)+1"
kernels, ie I'd have expected you to have to do 12-13 bisection attempts
to cut it down to one.You seem to have done 14 (you list 16 commits, two of which are the
starting points), which is right in that range. The reason you sometimes
get more is:- you "help" git bisect by choosing other commits than the optimal ones.
- with bad luck, it can be hard to get really close to "half the commits"
in the reachability analysis, especially if you have lots of merges
(and *especially* if you have octopus merges that merge more than two
branches of development). For example, say that you have something likea
|
+---+---+---+---+
| | | | |
b c d e fwhere you have six commits - you can't test any "combinations" at all,
since they are all independent, so "git bisect" cannot test them three
and three to cut down the time, so if you don't know which one is bad,
you'll basically end up testing them all.The bad luck case never really happens to that extreme in practice, and
even when it does you can sometimes be lucky and just hit on the bug early
(so "bad luck" may end up being "good luck" after all), but it explains
why you can get more - or less - than log2(n)+1 attempts. More commonly
one more.A much *bigger* problem is if you mark something good or bad that isn't
really. Ie if the bug comes and goes (it might be timing-dependent, for
example), the problem will be that you'll always narrow things down
(that's what bisection does), but you may not narrow it down to the right
thing!We've had that happen several times. If the bug (for example) means that
suspend *often* breaks, but sometimes works just by luck, you might mark a
kernel "good" when it really wa...
Side note: it's still usually fairly easy. Especially if you have a known
fix for the other bug, you can usually just do the equivalent ofgit cherry-pick <fixcommit>
at each point during this bisect (or just have a known patch that you keep
applying and un-applying), and you're largely done.Of course, if the area with the fix keeps changing, or if the fix is
really intrusive and nasty, this gets hairy, but at least in this case the
patch is fairly trivial and it shouldn't cause any trouble at all to do
this.The only real down-side is just the mindless extra work, and the possible
added confusion you get from modifying the history at the points you're
testing. "git bisect" is not necessarily happy about auto-picking a new
bisection point with a dirty tree, for example, so before you mark
something "good" or "bad", you should generally try to do so with a clean
git tree (ie if you apply a patch at each stage, do "git reset --hard" to
remove the patch before you do the "git bisect bad/good" stage).Similarly, especially at the end of the bisection run, if you actually use
"git cherry-pick" to *add* a commit, the bisection will now take that
added commit into account when trying to pick the next commit to check,
which is not what you really want. It probably doesn't matter that much
during the early stages (when bisection is really jumping around wildly
anyway, and one commit more or less doesn't really matter), but again, it
might be a good idea to make a habit of undoing the cherry-pick, the same
way you'd undo a patch (eg "git reset --hard HEAD^" would do it, if you
have exactly one cherry-pick that you tested).Linus
-
the situation is simpler than that: there is a /known/ bug, and i marked
the bugfix commit as 'good'. I never met such a multiple-bugs scenario
before and forgot that git-bisect could easily pick a tree without this
essential bugfix and would not be able to make a distinction between the
two types of badness.I'll try what i've described in the previous mail: mark all bisection
points that do not include f3ccb06f as 'good' - thus 'merging' the
known-bad area with the first known-good commit, and thus eliminating it
from the bisection space.(but it might also be useful to have a "git-bisect must-include" kind of
command that would allow this to be automated: mark a particular tree as
an essential component of the search space.)Ingo
-
this got me quite a bit further:
git-bisect start
git-bisect bad 01363220f5d23ef68276db8974e46a502e43d01d
git-bisect good f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38
git-bisect fake-good ee404566f97f9254433399fbbcfa05390c7c55f7
git-bisect bad d43a338e395371733a80ec473b40baac5f74d768
git-bisect bad 255f0385c8e0d6b9005c0e09fffb5bd852f3b506
git-bisect fake-good f99c6bb6e2e9c35bd3dc0b1d0faa28bd6970930d
git-bisect fake-good 0187f221e96e3436d552c0c7143f183eb82fb658
git-bisect bad 81450b73dde07f473a4a7208b209b4c8b7251d90
git-bisect fake-good ef29498655b18d2bfd69048e20835d19333981ab
git-bisect fake-good 8a03d9a498eaf02c8a118752050a5154852c13bf
git-bisect good 5c95d3f5783ab184f64b7848f0a871352c35c3cf
git-bisect good ecb5f7521a309cb9c5fc0832b9705cd2a03d7d45
git-bisect good 0539771d7236b425f285652f6f297cc7939c8f9a81450b73dde07f473a4a7208b209b4c8b7251d90 is first bad commit
[ note: by having the "git-bisect must-have-bugfix f3ccb06f"
functionality i mentioned in the previous mail git-bisect could
have eliminated the fake-good steps. ]it's not a resolution of this regression yet, because this commit is a
merge with upstream:| commit 81450b73dde07f473a4a7208b209b4c8b7251d90
| Merge: 8a03d9a... 0539771...
| Author: Len Brown <len.brown@intel.com>
| Date: Fri Feb 16 18:52:41 2007 -0500
|
| Pull misc-for-upstream into release branchwhich means that the fix in Len's tree got broken by merging with
upstream. Note: this 'upstream' in isolation is broken too, due to not
having that essential fix from Len's tree!So we quite likely have /two/ bugs, any of which breaks resume (which
breakage looks the same, so no way to isolate them via testing).I'll now try the following: i'll try to manually apply Len's fix to
every tree that git-bisect offers me, in the hope of being able to
isolate the /other/ bug.[ But really, i'm not expecting any miracles because this is way out of
league for git-...
I have confirmed these two on my system.
--
MST
-
BTW, the key here seems to be CONFIG_CC_OPTIMIZE_FOR_SIZE, at least that
is my current guess looking at the config options I still need to test
whether they make a difference. Either that, or the vmsplit option got
broken again. I'm attaching my x60 config that works for me.--
Jens Axboe
you could probably get quite a bit further in bisecting the other
breakage, by using the following method:manully apply the patch below to 81450b73dde and retest. It will most
likely work. Then FIRST unapply the patch and mark the tree via
'git-bisect good' and continue the bisection. Then try to apply the
patch again. If it's already included - ignore the rejected patch.
Whenever git-bisect offers you a new commit, just try to apply the
patch. Ok? This way you'll be able to avoid the known ACPI breakage, and
zoom in on the unknown breakage.Ingo
---------------->
commit f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38
Author: Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com>
Date: Tue Feb 13 02:35:50 2007 -0500ACPI: Disable wake GPEs only once.
fixes Suspend/Resume regressions due to recent ACPICA update.
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>diff --git a/drivers/acpi/events/evgpe.c b/drivers/acpi/events/evgpe.c
index dfac3ec..635ba44 100644
--- a/drivers/acpi/events/evgpe.c
+++ b/drivers/acpi/events/evgpe.c
@@ -636,17 +636,6 @@ acpi_ev_gpe_dispatch(struct acpi_gpe_event_info *gpe_event_info, u32 gpe_number)
}
}- if (!acpi_gbl_system_awake_and_running) {
- /*
- * We just woke up because of a wake GPE. Disable any further GPEs
- * until we are fully up and running (Only wake GPEs should be enabled
- * at this time, but we just brute-force disable them all.)
- * 1) We must disable this particular wake GPE so it won't fire again
- * 2) We want to disable all wake GPEs, since we are now awake
- */
- (void)acpi_hw_disable_all_gpes();
- }
-
/*
* Dispatch the GPE to either an installed handler, or the control method
* associated with this GPE (_Lxx or _Exx). If a handler exists, we invoke
-
I just confirmed that 0539771d7236b425f285652f6f297cc7939c8f9a
Going to test that now.
--
MST
-
this method pointed out the real bug that we are interested in:
| 774c47f1d78e373a6bd2964f4e278d1ce26c21cb is first bad commit
| commit 774c47f1d78e373a6bd2964f4e278d1ce26c21cb
| Author: Avi Kivity <avi@qumranet.com>
| Date: Mon Feb 12 00:54:47 2007 -0800
| [PATCH] KVM: cpu hotplug supportundoing the 774c47f1 patch from HEAD gave me a working resume. I'll send
a fix for this KVM breakage in a separate mail.here's how the bisection went:
bad: [01363220f5d23ef68276db8974e46a502e43d01d] [PARISC] clocksource:
good: [fa285a3d7924a0e3782926e51f16865c5129a2f7] Linux 2.6.20
+good: [bcdb81ae29091f6a66369aabfd8324e4a53d05dc] knfsd: SUNRPC: add a
bad: [920841d8d1d61bc12b43f95a579a5374f6d98f81] Merge branch 'for-linus'
+bad: [86a71dbd3e81e8870d0f0e56b87875f57e58222b] sysctl: hide the sysctl
+bad: [719c91ccadd3ed26570dbb29d54166914832eee9] [POWERPC] Use udbg_early
+bad: [ebaf0c6032f525ddb0158fb59848d41899dce8cd] Merge branch 'for-linus'
+good: [8cd133073f9b5cd335c0b2e4740aceb025d50ca9] kvm: Fix mismatch between
+bad: [5b8e8ee6c65a34d8aafaeb8e2eaa97e496c2567c] ps3: add shutdown to
+bad: [a524d946bdced73c5fbe60170fb33611491c4211] tgafb: sync-on-green
+bad: [a268422de8bf1b4c0cb97987b6c329c9f6a3da4b] fbdev driver for S3
+good: [8d0be2b3bf4a55606967d7d84e56c52521e94333] KVM: VMX: add vcpu_clear()
+bad: [59ae6c6b87711ceb2d1ea5f9e08bb13aee947a29] KVM: Host suspend/resume
+bad: [774c47f1d78e373a6bd2964f4e278d1ce26c21cb] KVM: cpu hotplug supportthe commits prefixed with '+' were the ones where i had to hand-merge
the f3ccb06f commit to. Near the end of the bisection it nicely honed in
on the group of KVM changes that included the bad commit.but the conclusion is clear: if multiple bugs are present in the search
area then it gets quite difficult to sort it out via git-bisect - but
it's not impossible either. The following git-bisect enhancement could
have made things easier for me:git-bisect mark-must-have <tree>
which would ma...
Subject: [patch] KVM: T60 resume fix
From: Ingo Molnar <mingo@elte.hu>my T60 laptop does not resume correctly due to KVM attempting to send an
IPI to a CPU that might be down (or not up yet). (Doing so also triggers
the send_IPI_mask_bitmask() warning in arch/i386/kernel/smp.c, line
732.)with this fix applied my laptop does not hang during resume.
[ KVM will have to disable/enable virtualization on the CPU itself that
goes down / comes up, not via an IPI sent from the requesting CPU. ]Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
drivers/kvm/kvm_main.c | 6 ------
1 file changed, 6 deletions(-)Index: linux/drivers/kvm/kvm_main.c
===================================================================
--- linux.orig/drivers/kvm/kvm_main.c
+++ linux/drivers/kvm/kvm_main.c
@@ -2083,12 +2083,6 @@ static int kvm_cpu_hotplug(struct notifi
case CPU_DEAD:
case CPU_UP_CANCELED:
decache_vcpus_on_cpu(cpu);
- smp_call_function_single(cpu, kvm_arch_ops->hardware_disable,
- NULL, 0, 1);
- break;
- case CPU_UP_PREPARE:
- smp_call_function_single(cpu, kvm_arch_ops->hardware_enable,
- NULL, 0, 1);
break;
}
return NOTIFY_OK;
-
That is already CPU_ONLINE in my tree (and in the pull request sent to
Linus a couple of days ago).--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.-
that solves the resume problem - but doesnt solve the CPU_DEAD issue of
sending an IPI to an already offline CPU. Might be a better idea to do
it in CPU_DOWN_PREPARE? (and then to also add a CPU_DOWN_FAILED branch?)Ingo
-
Mainline now has DOWN_PREPARE and UP_CANCELED calling
->hardware_disable(), and ONLINE calling ->hardware_enabled(). What
tree are you looking at?[but I do see the need for DOWN_FAILED now. Off to find a resumable
machine...]--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.-
yeah, both DOWN_FAILED and UP_FAILED might trigger when some other bug
prevents a resume.Ingo
-
suspend/resume works fine now and there are no warning messages
whatsoever (with suspend simulation). Thanks Avi!Ingo
-
Where do I find this suspend simulation? Sounds like a great suspend
debugging tool.--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.-
it's just the simple hack below.
Ingo
------------------------>
Subject: [patch] suspend debugging: simulate suspend-to-RAM
From: Ingo Molnar <mingo@elte.hu>most resume bugs are due to the kernel hanging or crashing while trying
to resume a specific device. It is extremely hard to debug such hangs
because often when the hang happens there's no console available yet.Make debugging such bugs easier by offering a resume mode that does not
actually call into the BIOS to turn off the machine. This, in
combination with earlyprintk=serial,ttyS0,115200,keep ,
CONFIG_PM_DEBUG=y, CONFIG_DISABLE_CONSOLE_SUSPEND=y and
/sys/power/filter enables me to have a fully functional serial console
during the full suspend/resume cycle.I debugged two suspend bugs in the -rt kernel this way already, so it's
pretty useful.Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
Documentation/kernel-parameters.txt | 4 ++++
drivers/acpi/sleep/main.c | 15 ++++++++++++++-
include/linux/acpi.h | 1 +
kernel/sysctl.c | 8 ++++++++
4 files changed, 27 insertions(+), 1 deletion(-)Index: linux/Documentation/kernel-parameters.txt
===================================================================
--- linux.orig/Documentation/kernel-parameters.txt
+++ linux/Documentation/kernel-parameters.txt
@@ -163,6 +163,10 @@ and is between 256 and 4096 characters.acpi_osi= [HW,ACPI] empty param disables _OSI
+ acpi_simulate_suspend_to_ram
+ [KNL] Do not call into BIOS to do suspend-to-RAM
+ Format: <0/1>
+
acpi_serialize [HW,ACPI] force serialization of AML methodsacpi_skip_timer_override [HW,ACPI]
Index: linux/drivers/acpi/sleep/main.c
===================================================================
--- linux.orig/drivers/acpi/sleep/main.c
+++ linux/drivers/acpi/sleep/main.c
@@ -35,6 +35,14 @@ static u32 acpi_suspend_states[] = {static int init_8259A_after_S1;
+/*
+ * simulate entry into ...
Thanks, I'll put it to good use. Are you planning to upstream it? Not
for 2.6.21, presumably ;)--
error compiling committee.c: too many arguments to function-
I think Pavel wants it triggerable through /sys/power/state:
http://lkml.org/lkml/2007/1/25/99--
MST
-
it needs an essential cleanup: instead of modifying 'mem' behavior, it
should be a 'testmem' thing and no acpi_ flag.Ingo
-
I just tried Ingo's .config and it hangs on resume for me (with suspend to memory).
--
MST
-
could you try 's2ram' from http://suspend.sf.net ? It's basically
equivalent to 'echo mem > /sys/power/state', but you can usually also
see the reason why it doesnt resume, on the vga console. (by default the
kernel doesnt resume the t60's vga properly)on my box resume works, but it's incredibly slow. Takes a few minutes to
have total effect.Ingo
-
OK, I'll give it a try. Do I have to shut down X or is it enough to
I guess it is possible that what I mistook for a hang was actually an incredibly
slow system, and that it would resume, eventually. I waited for 5 min and gaveI have same here with my old .config (2-3 min).
However, after resume was completed, system would hang on the *next* suspend to
ram.Ingo, do you see this behaviour, or can you suspend/resume any number of times?
--
MST
-
havent tried that yet - first trying to figure out why the first resume
misbehaves ;)Ingo
-
I noticed ;)
Resend, please.
-
Could you un-CC: git@vger.kernel.org from this thread, please?
-
> with this fix applied my laptop does not hang during resume.
On the off chance that this is relevant, could you post
your .config please?--
MST
-
sure - find it below.
Ingo
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.21
# Mon Mar 5 11:15:42 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_ASYNC_SUPPORT=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32#
# General setup
#
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
# CONFIG_TASK_XACCT is not set
# CONFIG_UTS_NS is not set
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
# CONFIG_IKCONFIG is not set
CONFIG_CPUSETS=y
CONFIG_SYSFS_DEPRECATED=y
CONFIG_RELAY=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_EXTRA_PASS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y#
# Block layer
#
CONFIG_BLOCK=y
CONFIG_LBD=y
CONFIG_BLK_DEV_IO...
Since I don't normally have kvm loaded, this patch is unlikely
to help me, is that right?--
MST
-
That is correct.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.-
no - i simply picked them by hand, based on looking at gittk output,
because bisection did not appear to find anything useful:9f4bd5dde81b5cb94e4f52f2f05825aa0422f1ff is first bad commit
And via that method i found a couple of more 'good' points - which
git-bisect never picked up by itself. (and i did 3-4 separate git-bisect
sessions, one of them was a "git-bisect start drivers/acpi/" - which is
the main area of suspicion). I looked at git-bisect visualize more than
once, and i've attached one of the bisection logs below.i also think i know what happens. Firstly, my testing is reliable, as i
mentioned it in the other mail i frequently re-visited commits to make
sure that none of my bad/good decisions is spurios - but no, the test
results are extremely reproducable: either the laptop resumes properly
after flashing its disk light or it does not.the problem i think is that i simply took git-bisect's behavior for
granted (i used it many times already) but forgot about a very basic
precondition: git-bisect will find only a /single/ good->bad transition.If there is a bad->good transition combined with a good->bad transition
then git-bisect will think it's the same 'badness', while it's a
/former/ badness that it is honing in on - totally sending the bisection
off into la-la-land.so as i mentioned it in the first mail: i /know/ that this commit is a
bad->good transition point:f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38
/and i only want to test commits that include this commit/ - because i
know that without this commit git-bisect confuses the /other/ breakage
with the new breakage. In the bisection log below, this choice of
git-bisect:ee404566f97f9254433399fbbcfa05390c7c55f7
is 'bad' according to testing, but that's 'another' badness - and i
missed it.Now, having slept on it, the solution is very simple: whenever
git-bisect picks a commit for which the following command comes up
empty:git-log | grep f3ccb06f3b8e0...
Hm, does 2.6.20-mm2 work? If not, you can bisect the broken-out sereis
with quilt.Rafael
-
Which got me nowhere, after bisecting down from 1213 revisions to
nothing. f3ccb06f3b8e0cf42b579db21f3ca7f17fcc3f38 certainly works, just
verified again. Trying only acpi related changes now...--
Jens Axboe-
This email lists some known regressions in 2.6.21-rc1 compared to 2.6.20
that are not yet fixed in Linus' tree.If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.Due to the huge amount of recipients, please trim the Cc when answering.
Subject : resume: slab error in verify_redzone_free(): cache `size-512':
memory outside object was overwritten
References : http://lkml.org/lkml/2007/2/24/41
Submitter : Pavel Machek <pavel@ucw.cz>
Handled-By : Marcel Holtmann <marcel@holtmann.org>
Status : unknownSubject : ThinkPad T60: no screen after suspend to RAM
References : http://lkml.org/lkml/2007/2/22/391
Submitter : Michael S. Tsirkin <mst@mellanox.co.il>
Handled-By : Ingo Molnar <mingo@elte.hu>
Status : unknownSubject : HP nx6325 notebook: usb mouse stops working after suspend to ram
References : http://lkml.org/lkml/2007/2/21/413
Submitter : Arkadiusz Miskiewicz <arekm@maven.pl>
Caused-By : Konstantin Karasyov <konstantin.a.karasyov@intel.com>
commit 0a6139027f3986162233adc17285151e78b39cac
Status : unknownSubject : MacBook: AE_NOT_FOUND ACPI messages
References : http://bugzilla.kernel.org/show_bug.cgi?id=8066
Submitter : Thomas Meyer <thomas.mey@web.de>
Status : unknownSubject : Asus A8N-VM motherboard: boot failure (ACPI related)
References : http://lkml.org/lkml/2007/2/23/132
Submitter : Andrew <andrew@nelless.net>
Status : unknownSubject : Asus M6Ne notebook: ACPI: First battery is not detected
References : http://bugzilla.kernel.org/show_bug.cgi?id=8080
Submitter : Janosch Machowinski <jmachowinski@gmx.de>
Status : unknownSubject : Asus M6Ne/M6A notebooks: SATA_ACPI errors during kernel boot
Refer...
Arkadiusz,
Could try the attached patch to see if it solves the problem?
If not, please send the output of acpidump command and log.Regards.
just found a new one: forcedeth.c broke since v2.6.20. It generates a
few irqs after ifup and then stops dead (no in/out packets). Undoing all
of these commits fixes the problem:commit 21828163b2b31e0675cb3e66a2a4a231dec06236
commit 57fff6986b6daae13947c565786757c05303f0f6
commit 4e16ed1b0e17a3832310031e2ddaeb0914eb837d
commit f0734ab658390380079369f7090dcf7aa226f394
commit b01867cbd1853995946c8c838eff93a0885d8bc6
commit 445583b89d71b48cf8c64e26acc5a710248feed7
commit aaa37d2d099f97ced415546e285ac901e47a2437
commit 86b22b0dfbf462e6ed75e54fc83575dae01e3c69
commit 0d63fb32b2b8c3464d9c1afc3ce3fd3ceec025b6
commit 164a86e40e6c74ec5a91c364ccf7b1a2295b0a52
commit 761fcd9e3e0aa2a02231d1631f31409be5e890d2
commit d2f7841277d8613a780ab28d04d8f31a31816153
commit 1d39ed565cfcc7c4fe586de621aef495c4f94ffbthat's 75K of delta - i'll bisect it some more. Hardware in question:
00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)
config options:
CONFIG_FORCEDETH=y
CONFIG_FORCEDETH_NAPI=yIngo
-
git bisection gives:
| 0d63fb32b2b8c3464d9c1afc3ce3fd3ceec025b6 is first bad commit
| commit 0d63fb32b2b8c3464d9c1afc3ce3fd3ceec025b6
| Author: Ayaz Abdulla <aabdulla@nvidia.com>
| Date: Tue Jan 9 13:30:13 2007 -0500
|
| forcedeth: rx skb recycle
|
| This patch removes the code that recycled the skb on error. This will
| help in reducing the branches in the main data paths.
|
| Signed-Off-By: Ayaz Abdulla <aabdulla@nvidia.com>
|
| Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
| :040000 040000 0e4cb0aa12ba1df6a7ca84da697c75082b449ece b70edcf390d63ac9e0b35d2aea7b59efe2bfeaea M driversbut i think this is simply the first commit that makes the driver
essentially non-bisectable - the failures in interim iterations changed
from 'interface unusable' to boot hangs that were either forcedeth tx
ring timeouts, or hangs after nv_sata initialization. I've attached the
full .config below.Ingo
-------------------------->
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.21-rc1-syslet-v5
# Tue Feb 27 09:46:29 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_ASYNC_SUPPORT=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32#
# General setup
#
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_TASKSTATS=y
CONFIG_TASK...
here's the boot log of the tx_ring timeout that happens at the following
commit in the first step of bisection:commit aaa37d2d099f97ced415546e285ac901e47a2437
Author: Ayaz Abdulla <aabdulla@nvidia.com>
Date: Sun Jan 21 18:10:42 2007 -0500forcedeth: tx limiting
(eth0/eth1 gets flipped during network startup, so eth1 is the forcedeth
driver)Ingo
---------------------->
Linux version 2.6.21-rc1-syslet-v5 (mingo@dione) (gcc version 4.0.2) #8 SMP PREEMPT Tue Feb 27 10:33:37 CET 2007
BIOS-provided physical RAM map:
sanitize start
sanitize end
copy_e820_map() start: 0000000000000000 size: 000000000009f800 end: 000000000009f800 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 000000000009f800 size: 0000000000000800 end: 00000000000a0000 type: 2
copy_e820_map() start: 00000000000f0000 size: 0000000000010000 end: 0000000000100000 type: 2
copy_e820_map() start: 0000000000100000 size: 000000003fef0000 end: 000000003fff0000 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 000000003fff0000 size: 0000000000003000 end: 000000003fff3000 type: 4
copy_e820_map() start: 000000003fff3000 size: 000000000000d000 end: 0000000040000000 type: 3
copy_e820_map() start: 00000000e0000000 size: 0000000010000000 end: 00000000f0000000 type: 2
copy_e820_map() start: 00000000fec00000 size: 0000000001400000 end: 0000000100000000 type: 2
BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS)
BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
127MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000f5680
Entering add_active_range(0, 0, 262128) 0 entries of 256 used
Zone PFN ...
update: Jeff sent me 3 pending forcedeth.c fixes, and they indeed fix
the regression. The one that fixed it is:----------------->
The napi poll routine was missing the call to the optimized rx process
routine. This patch adds the missing call for the optimized path.Signed-Off-By: Ayaz Abdulla <aabdulla@nvidia.com>
--- orig/drivers/net/forcedeth.c 2007-02-20 03:17:21.000000000 -0500
+++ new/drivers/net/forcedeth.c 2007-02-20 03:28:31.000000000 -0500
@@ -3104,13 +3104,17 @@
struct fe_priv *np = netdev_priv(dev);
u8 __iomem *base = get_hwbase(dev);
unsigned long flags;
+ int retcode;- if (np->desc_ver == DESC_VER_1 || np->desc_ver == DESC_VER_2)
+ if (np->desc_ver == DESC_VER_1 || np->desc_ver == DESC_VER_2) {
pkts = nv_rx_process(dev, limit);
- else
+ retcode = nv_alloc_rx(dev);
+ } else {
pkts = nv_rx_process_optimized(dev, limit);
+ retcode = nv_alloc_rx_optimized(dev);
+ }- if (nv_alloc_rx(dev)) {
+ if (retcode) {
spin_lock_irqsave(&np->lock, flags);
if (!np->in_shutdown)
mod_timer(&np->oom_kick, jiffies + OOM_REFILL);-
Ok, that was included in the bunch I just pulled and pushed out, so
hopefully this one is resolved. Pls verify.Linus
-
yep, can confirm that it's now fixed upstream too.
Ingo
-
And lot of acpi/suspend changes, which seem to break my machine in
weird and not really reproducible way. I'm looking onto that.(Yep, that should teach me to test -mm a bit more).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
Hello.
I cannot boot 2.6.21-rc1; it falls into OOM-Killer.
Interesting error message I can see is:
request_module: runaway loop modprobe net-pf-1After bisecting, the commit
Driver core: let request_module() send a /sys/modules/kmod/-uevent
(id c353c3fb0700a3c17ea2b0237710a184232ccd7f) is to blame.Reverting it fixes the issue to me.
Regards,
--yoshfuji
-
Hi Thomas,
I'm testing NO_HZ on my machines. On the laptop I see that the timer
interrupt counter is incremented (though slower than HZ). This machine
is running UP kernel.On my desktop I see this:
CPU0 CPU1
0: 114 0 IO-APIC-edge timer
1: 1624 10771 IO-APIC-edge i8042
6: 3 0 IO-APIC-edge floppy
7: 0 0 IO-APIC-edge parport0
9: 0 0 IO-APIC-fasteoi acpi
12: 40111 184047 IO-APIC-edge i8042
16: 75624 998858 IO-APIC-fasteoi radeon@pci:0000:01:00.0, uhci_hcd:usb1
17: 0 0 IO-APIC-fasteoi uhci_hcd:usb4
18: 711 5487 IO-APIC-fasteoi ide1, libata, ehci_hcd:usb7, uhci_hcd:usb3
19: 617 2254 IO-APIC-fasteoi libata, uhci_hcd:usb2
20: 0 0 IO-APIC-fasteoi ehci_hcd:usb6, uhci_hcd:usb5
21: 2483869 0 IO-APIC-fasteoi eth0
22: 2 0 IO-APIC-fasteoi ohci1394
218: 28872 360643 PCI-MSI-edge HDA Intel
219: 32932 138196 PCI-MSI-edge libata
NMI: 0 0
LOC: 2761191 2827539
ERR: 0
MIS: 0Interrupt 0 is stuck at 114 (the number is consistent across reboots). I
don't experience any problem, time is running fine. Still it's strange
that the timer is doing nothing; maybe something other than the PIT is
used for time keeping?This is the dmesg of the "abnormal" machine (dual core, SMP kernel):
Linux version 2.6.20-ge696268a-dirty (kronos@dreamland.darkstar.lan) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #33 SMP PREEMPT Tue Feb 20 23:24:24 CET 2007
BIOS-provided physical RAM map:
sanitize start
sanitize end
copy_e820_map() start: 0000000000000000 size: 000000000009c800 end: 000000000009c800 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 000000000009c800 size: 0000000000003800 end: 00000000000a0000 type:...
Yes, we switch away from PIT and use the local APIC timer. (LOC)
tglx
-
What's the benefit of doing so - and why has not it been done before?
I mean, I run a regular 2.6.18.6,
/sys/devices/system/clocksource/clocksource0/current_clocksource (is this
related?) shows "acpi_pm", but the IRQ0 counter increases at HZ. Maybe I
am confusing things, but why the need for PIT when clocksource is acpi_pm
anyway?Jan
--
-
acpi_pm is only a readout device to keep track of current time.
PIT and local APIC timer are used to provide either periodic or one shot
programmable timer events.Up to now the kernel started PIT and local APIC timer in parallel with
the same period where PIT incremented jiffies and local APIC timer
called update_process_times() and profile_tick. We changed this to let
the boot cpu increment jiffies inside the local apic timer interrupt
after PIT has been stopped. A whole interrupt for jiffies64++ is waste.Also when we switch to nohz / high resolution mode, we want to use the
local APIC, as it is much faster to access. The maximum delta to program
is 27ms for the pit and ~1sec for the local APIC. This is important for
dynticks, as we can achieve longer idle sleeps w/o reprogramming the
timer.tglx
-
From: Thomas Gleixner <tglx@linutronix.de>
BTW, I'm adding support for sparc64, and before I get much further
will the code handle a oneshot-only device? That's basically what I
have (sparc64 basically has a TSC and a "comparison" register, you
write the "comparison" register with the TSC value at which you'd like
the timer interrupt to occur, so it's one-shot and you have to write
it again to get the next timer).I see logic in the generic code to do periodic events when the
timer only provides one-shot ticks. But as far as I can tell
both the HPET and the local APIC support periodic timers so
I can't tell how much testing that code has gotten :-)I, in fact, don't see any clockevent device in the current tree that
does not set CLOCK_EVT_FEAT_PERIODIC.I guess you could clear that bit just to test those generic code
paths. :-)BTW, sparc64 always did the trick where the do_timer() work was done
by one of the per-cpu local timer interrupts, I'm glad the idea gained
traction generically. :-)))
-
It was already implemented before for x86-64, but disabled
by default because it ran into some platform issues. The new code
hopefully knows now how to work around them.-Andi
-
Yes, all you need is to omit the CLOCK_EVT_FEAT_PERIODIC flag when you
tglx
-
From: Thomas Gleixner <tglx@linutronix.de>
Thanks a lot Thomas.
I noticed while doing this work that the generic clock code is
incompatible with the time interpolator, since both provide a
do_{get,set}timeofday() implementation. Just FYI...
-
David,
<trimmed CC due to sparcness, added John instaed >
John, can you have a look at this ?
tglx
-
From: Thomas Gleixner <tglx@linutronix.de>
Note we may not care :-) All the interpolator does is provide
a generic tick based gettimeofday implementation, and the generic
clock code does essentially the same thing.And the generic clock code must in fact provide the implementations
in order to deal with dyntick issues.The only other platform using the time interpolator is IA64,
and after my dyntick implementation on sparc64 the one and
only user will be IA64 :-)
-
Yea. I actually have some in-progress patches from Peter Keilty that
convert ia64 and sparc64 time_interpolators to clocksources, then
removes the time_interpolator code.The ia64 conversion is more complicated due to the fsyscall asm, but I
think the sparc64 conversion (below) is pretty straight forward. I've
only built tested this, so I have no clue if it actually works.Any thoughts?
thanks
-johnFrom: Peter Keilty <peter.keilty@hp.com>
Initial sparc64 conversion to generic clocksource/timekeeping code.
Signed-off-by: Peter Keilty <peter.keilty@hp.com>
Signed-off-by: John Stultz <johnstul@us.ibm.com>Kconfig | 2 +-
defconfig | 2 +-
kernel/time.c | 34 +++++++++++++++++++++++-----------
3 files changed, 25 insertions(+), 13 deletions(-)linux-2.6.21-rc1_timeofday-arch-sparc64_C7.patch
============================================
diff --git a/arch/sparc64/Kconfig b/arch/sparc64/Kconfig
index f75a686..13a4547 100644
--- a/arch/sparc64/Kconfig
+++ b/arch/sparc64/Kconfig
@@ -34,7 +34,7 @@ config LOCKDEP_SUPPORT
bool
default y-config TIME_INTERPOLATION
+config GENERIC_TIME
bool
default ydiff --git a/arch/sparc64/defconfig b/arch/sparc64/defconfig
index 0f44a6a..33e061a 100644
--- a/arch/sparc64/defconfig
+++ b/arch/sparc64/defconfig
@@ -9,7 +9,7 @@ CONFIG_64BIT=y
CONFIG_MMU=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
-CONFIG_TIME_INTERPOLATION=y
+CONFIG_GENERIC_TIME=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
diff --git a/arch/sparc64/kernel/time.c b/arch/sparc64/kernel/time.c
index f84da4f..497b29b 100644
--- a/arch/sparc64/kernel/time.c
+++ b/arch/sparc64/kernel/time.c
@@ -31,6 +31,7 @@ #include <linux/percpu.h>
#include <linux/profile.h>
#include <linux/miscdevice.h>
#include <linux/rtc.h>
+#include <linux/clocksource.h>#include <asm/oplib.h>
#include <asm...
-
From: john stultz <johnstul@us.ibm.com>
Hey John, I had to do this already in order to do the dynticks
port to sparc64, but nice to see another attempt :-)Two things I did on my side:
+ .mask = 0xffffffffffffffffLL,
I used CLOCKSOURCE_MASK(64) here.
+static cycle_t read_sparc64_cpuclock(void)
+{
+ return (cycle_t)get_cycles();
+}
...
+ .read = read_sparc64_cpuclock,You can just directly assign tick_ops->get_tick to .read at run-time
to avoid a stack frame and function call/return.+ .shift = 16,
These shift selections all seem rather arbitrary.
If it's not an arbitrary selection, it would be nice to have some
comments about how to go about choosing an appropriate shift.
I imagine the selections has to do with the possible range of
the frequencies the clocksource supports, and how much
accuracy you get for certain shift selections given that range.
-
As I suspected, the one-shot code wasn't very well tested and I'd be
the one to debug this thing on sparc64 :-)When a timer exceeds the timer period, the one-shot handling code does
the following loop:for (;;) {
ktime_t next = ktime_add(dev->next_event, tick_period);if (!clockevents_program_event(dev, next, ktime_get()))
return;
tick_periodic(cpu);
}So it just keeps running tick_periodic() until we "catch up".
Problem is, if clockevents_program_event() gets a "next" time in the
past, the very case where we'll loop, it DOES NOT update
dev->next_event. It returns the error before doing so.As a result of this, we'll loop forever here, the softlockup watchdog
will trigger, and the system will wedge completely.I was getting a softlockup and immediate system hang, so to debug this
I kept a history of the last 8 TSC values when tick_periodic() was
invoked. At softlockup trigger, I'd dump the log. And what I saw
were TSC deltas that we so tiny as to be just enough to indicate
tick_periodic() was running in a tight loop :-)I propose the following fix, which I'm about to test.
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 4500e34..0986a2b 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -77,6 +77,7 @@ static void tick_periodic(int cpu)
void tick_handle_periodic(struct clock_event_device *dev)
{
int cpu = smp_processor_id();
+ ktime_t next;tick_periodic(cpu);
@@ -86,12 +87,12 @@ void tick_handle_periodic(struct clock_event_device *dev)
* Setup the next period for devices, which do not have
* periodic mode:
*/
+ next = ktime_add(dev->next_event, tick_period);
for (;;) {
- ktime_t next = ktime_add(dev->next_event, tick_period);
-
if (!clockevents_program_event(dev, next, ktime_get()))
return;
tick_periodic(cpu);
+ next = ktime_add(next, tick_period);
}
}-
Yep, that's caused by a late change for the *!&#ed broadcast stuff for
-
Correct. The higher the shift value, the more precise NTP multiplier
adjustment we can make. However, too large w/ a high frequency
clocksource and you'll risk overflowing 64bits on the mult.Although that's not super critical anymore since Roman implemented the
high-res error accounting, the net effect should be the same over the
long term(we just have to work harder w/ courser adjustments - resulting
in very small clock frequency oscillations).I do need to get some documentation going. I had some back when I first
started pushing the changes, but so much was revised and rewritten it
stopped being correct.-john
-
From: john stultz <johnstul@us.ibm.com>
While we were discussing this I was debugging my clocksource sparc64
implementation, a shift of 32 didn't work whereas one of 16 (as
in your version of sparc64 support) did. :-)There is also a very unfortunate inconsistency between how the shift
and multiplier are used in clock sources vs. clock events.You can initialize your clocksource multiplier using:
mult = clocksource_hz2mult(ticks_per_hz, SHIFT);
but WOE BE TO HE who tries to use that for clockevents (like I
did initially) :-) You have to instead use something like:mult = div_sc(ticks_per_hz, NSEC_PER_SEC, SHIFT);
I would have been done with the sparc64 dynticks support yesterday if
I didn't trip over all this stuff.It also isn't clear from the comments that the first argument
to clocksource_hz2mult() and div_sc() is indeed "ticks_per_hz".
I had to gleen over the hpet and LAPIC drivers to kind of figure
this out. Another potential improvement for the comments :-)
-
you're mixing up 2 concepts:
1) clocksource
2) eventsource1) is for "what time is it now", and acpi_pm is useful for that, as are
several other things such as rdtsc
2) is for "I need THIS to happen X miliseconds from now". acpi_pm is not
useful for that, nor is rdtsc.some can be used for both (PIT), but on a concept level the uses are
independent. The advantage of local apic over PIT is that local apic is
cheap to do "one shot" future events with, while the PIT will tick
periodic at a fixed frequency. With tickless idle.. that's not what you
want.--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org-
So with a local apic, and acpi_pm as clocksource, I shouldn't be getting timer
interrupts? Yet I do. Which I assume means that the kernel will still get woken
up very often.Rgds
--
-- Pierre OssmanLinux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
-
timer interrupts as in "irq0"?
you shouldn't if you use the hrtimers/tickless stuff...
if irq0 keeps increasing at 100Hz or 1000Hz or so.. then yes
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org-
Yes:
CONFIG_HIGH_RES_TIMERS=y
Sure. My dmesg is full of mmc debug crud right now, but I'll just reboot
and I'll have a clean one for you.Rgds
--
-- Pierre OssmanLinux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org-
Here we go.
--
-- Pierre OssmanLinux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
---------------------------------------------------------^^^^^
Here is the reason. The local APIC stops working in C3 state and we fall
back to the PIT in that case. Not really exciting for dynticks, but the
only way to keep the system alive. There is a patch coming up from
Intel, which finds out how to use HPET even if it is not enabled by the
BIOS. This will still end up on IRQ#0, but will give way longer idle
sleeps than the PIT.tglx
-
Thomas,
I have the patchset for this HPET part ready to roll out. But, looks
like
NO_HZ and lapic eventsource support is only in i386 and not in x86-64 in
2.6.21-rc1.
Do you know the state of NO_HZ x86-64 support?Will it get into git soon? In which case I can rebase my patches against
git
and send it out for review/testing.Otherwise, I will send out my patches, which are against rt (and my
initial
implementation of this HPET part is only for x86-64).Thanks,
Venki
-
So then the next two questions are; is it possible to disable C3 and is
it a net power gain to get rid of the wakeups in favor of having C3.Rgds
--
-- Pierre OssmanLinux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org-
no; c3 saves a TON more power.
you can try enabling HPET in your BIOS...
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org-
Hah, I wish! This is a laptop, so the BIOS is as brain dead and broken
as is humanly possible.Can I determine if I have the required hardware? So I can tell if I'm
permanently screwed, or just temporarily.Rgds
--
-- Pierre OssmanLinux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org-
Hi,
http://lkml.org/lkml/2006/11/14/153
and related posts in this older thread
("CONFIG_NO_HZ: missed ticks, stall (keyb IRQ required) [2.6.18-rc4-mm1]")
should help.Andreas Mohr
-
if it's something built in the last year or two you have the hw.
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org-
I have an ICH4-M, and from Intel's datasheets it looks like I got the
short straw..--
-- Pierre OssmanLinux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org-
Before this becomes a FAQ. Would anybody mind if I just renamed "timer"
to "pit" to make this clear?-Andi
-
Ok, thanks for the clarification.
Luca
-
Hi,
CHK include/linux/version.h
CHK include/linux/utsrelease.h
CHK include/linux/compile.h
CC [M] drivers/char/ip2/ip2main.o
In file included from drivers/char/ip2/ip2main.c:285:
drivers/char/ip2/i2lib.c: In function `iiSendPendingMail_t':
drivers/char/ip2/i2lib.c:83: sorry, unimplemented: inlining failed in call
to 'iiSendPendingMail': function body not available
drivers/char/ip2/i2lib.c:157: sorry, unimplemented: called from here
make[3]: *** [drivers/char/ip2/ip2main.o] Error 1
make[2]: *** [drivers/char/ip2] Error 2
make[1]: *** [drivers/char] Error 2
make: *** [drivers] Error 2With cleanup changes in commit 40565f1962c5be9b9e285e05af01ab7771534868
compilation fails.Regards,
- Faik
-
Yeah, that thing was crud.
Linus
---
commit 5fc7e655a50b0a19229a6b4a8a5e23bfedf700a4
Author: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Wed Feb 21 11:18:26 2007 -0800Fix bogus 'inline' in drivers/char/ip2/i2lib.c
Not only was the function way too big to be inlined in the first place,
it was used before it was even defined.Noted-by: Faik Uygur <faik@pardus.org.tr>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>diff --git a/drivers/char/ip2/i2lib.c b/drivers/char/ip2/i2lib.c
index f86fa0c..e46120d 100644
--- a/drivers/char/ip2/i2lib.c
+++ b/drivers/char/ip2/i2lib.c
@@ -80,7 +80,7 @@ static int i2RetryFlushOutput(i2ChanStrPtr);
// Not a documented part of the library routines (careful...) but the Diagnostic
// i2diag.c finds them useful to help the throughput in certain limited
// single-threaded operations.
-static inline void iiSendPendingMail(i2eBordStrPtr);
+static void iiSendPendingMail(i2eBordStrPtr);
static void serviceOutgoingFifo(i2eBordStrPtr);// Functions defined in ip2.c as part of interrupt handling
@@ -166,7 +166,7 @@ static void iiSendPendingMail_t(unsigned long data)
// If any outgoing mail bits are set and there is outgoing mailbox is empty,
// send the mail and clear the bits.
//******************************************************************************
-static inline void
+static void
iiSendPendingMail(i2eBordStrPtr pB)
{
if (pB->i2eOutMailWaiting && (!pB->i2eWaitingForEmptyFifo) )
-
What compiler?
thanks,
--
http://www.fi.muni.cz/~xslaby/ Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E
-
Oh, I can reproduce with gcc 3.4. Going to fix it.
thanks,
--
http://www.fi.muni.cz/~xslaby/ Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E
-
On i386 I get the following,
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
Testing NMI watchdog ... CPU#0: NMI appears to be stuck (24->24)!
CPU#1: NMI appears to be stuck (0->0)!
CPU#2: NMI appears to be stuck (0->0)!
CPU#3: NMI appears to be stuck (0->0)!when I add nmi_watchdog=1 to my boot args which worked on prior kernels.
On closer inspection it looks like arch/i386/kernel/io_apic.c :
check_timer() --> timer_irq_works() depends on IRQ0 incrementing jiffies
which is no longer the case AFAIK.I'm not sure exactly how that relates to the NMI, but the check_timer()
function disabled the NMI through the io-apic if it can't get the
"timer" working through the io-apic.Daniel
-
At this point the PIT / HPET _is_ active and incrementing jiffies. The
Boot log please.
tglx
-
.config is in there too .
ftp://source.mvista.com/pub/dwalker/tglx/
-
Why? The switch just stops the PIT/HPET. It does not fiddle with IO_APIC
Nothing obvious. Bisect time :(
tglx
-
I already bisected this on my old pIII, which has the same problem:
clockevents-i386-drivers.patch
-
yes - we know what the problem is (and will fix it): the stopping of the
PIT - nmi_watchdog=1 is hack to use the IO-APIC's PIT pin to also signal
NMIs.Just to clarify, this problem does not occur if HIGH_RES_TIMERS is off,
correct?Ingo
-
s/does not occur/does occur/
we switch off the PIT whenever we can.
Ingo
-
It does, as we switch off PIT when lapic is available in any case.
tglx
-
I'm not an expert on the io-apic, but the check_timer() function seemed
Well, I'm pretty sure it's HRT, cause in prior versions this only
happened when HRT is enabled. Then you guys went to the lapic all the
time, and now this is happening all the time ..You can't reproduce this?
Daniel
-
Again:
check_timer() is called _BEFORE_ we even touch the local APIC timers. At
The NMI is stuck:
if (nmi_count(cpu) - prev_nmi_count[cpu] <= 5) {
printk("CPU#%d: NMI appears to be stuck (%d->%d)!\n",
cpu,
prev_nmi_count[cpu],
nmi_count(cpu));This has nothing to do with jiffies.
Nope.
Also all my machines emit something like:
"ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])"In your boot log nothing to see.
tglx
-
Right, but eventually there isn't a regular timer interrupt through the
io-apic. I don't think in the past IRQ0 stops without the system
crashing, so check_timer() could assume the timer (IRQ0) is _always_
regular.do you know what the requirement are for routing the NMI through the
I think it has to do with IRQ0 . Did I mention this doesn't happen in
Do you use nmi_watchdog=1 ?
Daniel
-
Sorry. I checked. switching PIT off really breaks nmi_watchdog=1, as
this just mirrors IRQ#0 to the NMI. No IRQ#0 from PIT, no NMIWe could keep PIT running with an empty interrupt handler when
nmi_watchdog=1 is set, but this interferes nicely with broadcasting.Does nmi_watchdog=2 work ? We might switch to that, when a local APIC is
available.tglx
-
There's a compile failure during my bisect.
distcc[3863] ERROR: compile /tmp//hrtimer.tmp.dwalker1.3795.i on dwalker3/120 failed
kernel/hrtimer.c: In function 'hrtimer_cpu_notify':
kernel/hrtimer.c:884: warning: implicit declaration of function 'clockevents_notify'
kernel/hrtimer.c:884: error: 'CLOCK_EVT_NOTIFY_CPU_DEAD' undeclared (first use in this function)
kernel/hrtimer.c:884: error: (Each undeclared identifier is reported only once
kernel/hrtimer.c:884: error: for each function it appears in.)
drivers/ide/setup-pci.c: In function 'ide_scan_pcibus':
drivers/ide/setup-pci.c:866: warning: ignoring return value of '__pci_register_driver', declared with attribute warn_unused_result
make[1]: *** [kernel/hrtimer.o] Error 1from this commit,
commit f8381cba04ba8173fd5a2b8e5cd8b3290ee13a98
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Fri Feb 16 01:28:02 2007 -0800[PATCH] tick-management: broadcast functionality
With Ingo Molnar <mingo@elte.hu>
Add broadcast functionality, so per cpu clock event devices can be registere
as dummy devices or switched from/to broadcast on demand. The broadcast
function distributes the events via the broadcast function of the clock even
device. This is primarily designed to replace the switch apic timer to / fr
IPI in power states, where the apic stops.-
When that happens, you need to pick another commit to try than the one git
selected for you automatically. You can do that by doinggit bisect visualize
and select another commit somewhere fairly close to a mid-point, and try
that withgit reset --hard <commit-sha1>
instead.
Linus
-
hrmpf. we made it bisectable at some point.
tglx
-
It was related some code under a hot plug ifdef ..
Here's the final commit from the bisect which caused it . It says "No
changes to existing functionality" ?e9e2cdb412412326c4827fc78ba27f410d837e6e is first bad commit
commit e9e2cdb412412326c4827fc78ba27f410d837e6e
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Fri Feb 16 01:28:04 2007 -0800[PATCH] clockevents: i386 drivers
Add clockevent drivers for i386: lapic (local) and PIT/HPET (global). Update
the timer IRQ to call into the PIT/HPET driver's event handler and the
lapic-timer IRQ to call into the lapic clockevent driver. The assignement of
timer functionality is delegated to the core framework code and replaces the
compile and runtime evalution in do_timer_interrupt_hook()Use the clockevents broadcast support and implement the lapic_broadcast
function for ACPI.No changes to existing functionality.
-
Ok, it wouldn't be the first time some change that is supposed to change
nothing does actually change something.That said, one thing to worry about when doing bisection: the kernel
configuration.If you always just do "make oldconfig" or something, the kernel config for
the thing you test will depend on the _previous_ kernel you compiled, and
that is not always what you want. I've once had a failing kernel, did
bisection, and it turned out that since I had gone back in time to before
the option that caused the failure even existed, I had (by mistake) then
compiled some of the later kernels without that option enabled, and called
them "good".The end result: "git bisect" didn't actually end up pointing to the right
commit, just because I had effectively lied to it.That said, considering that you did get a commit that doesn't look
entirely unlikely (and that clearly changes things that are relevant), I
suspect you did actually find the right one.Linus
-
This bit me badly the one time I did a git bisect; it kept ping-
ponging around a big change (sata? xtables?) that required me to
answer the same two dozen questions about ten different times.If git-bisect itself could manage .config as part of the process,
reverting to the one used with the most-recently-in-the-past try when
it backs up to a revision, that would be, you know, wonderful.--Pete
----------------------------------
Pete Harlan
ArtSelect, Inc.
harlan@artselect.com
http://www.artselect.com
ArtSelect is a subsidiary of a21, Inc.
-
In this case I don't think anything was specifically turned on, beyond
SMP. For instance HRT/dynamic tick was off.I didn't run "make oldconfig", but just running "make" asked for options
I think if it's not that exact commit it's still one in that set. I
mainly wanted to confirm that it was an hrt/dynamic tick issue , and not
some left field patches..Daniel
-
Yup, thats the one which switches off PIT after we have the local APIC
timers up and running. Which turns out to cause the nmi_watchdog not
working anymore.tglx
-
Oddly, nmi_watchdog=2 doesn't work in 2.6.21-rc1, but it works in
2.6.20-rt8 however I'm not sure of the config could have been PREEMPT_RT
was on.Daniel
-
Yup. Fun starts in drivers/net/e1000
e1000 is not working anymore. ifup fails permanentely.
ADDRCONF(NETDEV_UP): eth0: link is not ready
nothing elsebisect identifies:
d2ed16356ff4fb9de23fbc5e5d582ce580390106 is first bad commit
commit d2ed16356ff4fb9de23fbc5e5d582ce580390106
Author: Kok, Auke <auke-jan.h.kok@intel.com>
Date: Fri Feb 16 14:39:26 2007 -0800e1000: fix shared interrupt warning message
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>Reverting this patch on top of -rc1 helps as well.
tglx
lspci output:
04:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
Subsystem: Intel Corporation Unknown device 30a5
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 218
Region 0: Memory at 90100000 (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at 2000 [size=32]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable+
Address: 00000000fee0100c Data: 414a
Capabilities: [e0] Express Endpoint IRQ 0
Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag-
Device: Latency L0s <512ns, L1 <64us
Device: AtnBtn- AtnInd- PwrInd-
Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
Device: Rlxd...
> commit d2ed16356ff4fb9de23fbc5e5d582ce580390106
I think we need to drop this now. The report that says that this *fixes*
something might have been on regular interrupts only. I currently suspect that
it breaks all MSI interrupts, which would make sense if I look a the code. Very
bad indeed.I'll try to come up with something else or send a patch that reverts it.
Auke
-
I'm going to be off-line for a couple of days, so I just reverted it.
Linus
---
commit b5bf28cde894b3bb3bd25c13a7647020562f9ea0
Author: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Wed Feb 21 11:21:44 2007 -0800Revert "e1000: fix shared interrupt warning message"
This reverts commit d2ed16356ff4fb9de23fbc5e5d582ce580390106.
As Thomas Gleixner reports:
"e1000 is not working anymore. ifup fails permanentely.
ADDRCONF(NETDEV_UP): eth0: link is not ready
nothing else"The broken commit was identified with "git bisect".
Auke Kok says:
"I think we need to drop this now. The report that says that this
*fixes* something might have been on regular interrupts only. I
currently suspect that it breaks all MSI interrupts, which would make
sense if I look a the code. Very bad indeed."Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Auke Kok <auke-jan.h.kok@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index a710237..98215fd 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -1417,6 +1417,10 @@ e1000_open(struct net_device *netdev)
if ((err = e1000_setup_all_rx_resources(adapter)))
goto err_setup_rx;+ err = e1000_request_irq(adapter);
+ if (err)
+ goto err_req_irq;
+
e1000_power_up_phy(adapter);if ((err = e1000_up(adapter)))
@@ -1427,10 +1431,6 @@ e1000_open(struct net_device *netdev)
e1000_update_mng_vlan(adapter);
}- err = e1000_request_irq(adapter);
- if (err)
- goto err_req_irq;
-
/* If AMT is enabled, let the firmware know that the network
* interface is now open */
if (adapter->hw.mac_type == e1000_82573 &&
@@ -1439,10 +...
On Wed, 21 Feb 2007 11:24:33 -0800 (PST)
OK, but this change was needed because the new IRQ-debugging code reliably
causes e1000 to explode.So perhaps until e1000 gets sorted out we should disable the debug code:
--- a/lib/Kconfig.debug~a
+++ a/lib/Kconfig.debug
@@ -79,7 +79,7 @@ config DEBUG_KERNELconfig DEBUG_SHIRQ
bool "Debug shared IRQ handlers"
- depends on DEBUG_KERNEL && GENERIC_HARDIRQS
+ depends on DEBUG_KERNEL && GENERIC_HARDIRQS && BROKEN
help
Enable this to generate a spurious interrupt as soon as a shared
interrupt handler is registered, and just before one is deregistered.
_-
| Greg KH | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 005/196] Chinese: add translation of SubmittingDrivers |
| Adrian Bunk | [1/6] 2.6.21-rc2: known regressions |
| Paul Jackson | Re: cpuset-remove-sched-domain-hooks-from-cpusets |
git: | |
| Linus Torvalds | Re: [GIT]: Networking |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Andrew Morton | Re: [BUG] New Kernel Bugs |
