Re: [-mm patch] make gfs2_writepages() static

Previous thread: [PATCH -mm] jmicron: 40/80pin primary detection by ethanhsiao on Monday, January 29, 2007 - 9:03 pm. (2 messages)

Next thread: Re: [Ksummit-2007-discuss] Re: [Ksummit-2006-discuss] 2007 Linux Kernel Summit by Jes Sorensen on Monday, January 29, 2007 - 9:51 pm. (9 messages)
From: Andrew Morton
Date: Monday, January 29, 2007 - 9:45 pm

Temporarily at

	http://userweb.kernel.org/~akpm/2.6.20-rc6-mm3/

Will appear later at

	ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc6/2.6.20-rc6-mm3/


- Restored git-block.patch: mainly the block unplugging rework.  The
  problematic CFQ updates have been taken out.

- Restored the fsaio patches as a consequence.

- A huge ACPI update.

- A decent number of x86 patches have been temporarily dropped due to their
  clash against the ACPI update.

- A few problems reported against 2.6.20-rc6-mm2 have been fixed.




Boilerplate:

- See the `hot-fixes' directory for any important updates to this patchset.

- To fetch an -mm tree using git, use (for example)

  git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1
  git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1

- -mm kernel commit activity can be reviewed by subscribing to the
  mm-commits mailing list.

        echo "subscribe mm-commits" | mail majordomo@vger.kernel.org

- If you hit a bug in -mm and it is not obvious which patch caused it, it is
  most valuable if you can perform a bisection search to identify which patch
  introduced the bug.  Instructions for this process are at

        http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt

  But beware that this process takes some time (around ten rebuilds and
  reboots), so consider reporting the bug first and if we cannot immediately
  identify the faulty patch, then perform the bisection search.

- When reporting bugs, please try to Cc: the relevant maintainer and mailing
  list on any email.

- When reporting bugs in this kernel via email, please also rewrite the
  email Subject: in some manner to reflect the nature of the bug.  Some
  developers filter by Subject: when looking for messages to read.

- Semi-daily snapshots of the -mm lineup are uploaded to
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on
  the mm-commits ...
From: Jeff Garzik
Date: Monday, January 29, 2007 - 9:50 pm

Now that kevent work has settled down, would you be open to including it 
in -mm?

	Jeff


-

From: Andrew Morton
Date: Monday, January 29, 2007 - 10:11 pm

On Mon, 29 Jan 2007 23:50:51 -0500

I just haven't had the bandwidth to track what's been happening there
lately.  The two main things which need to be done are

-  a detailed line-by-line review

- for someone to gain a full understanding of the delta between <these
  patches> and <ulrich> and to explain these differences to mortals and to
  convince themselves and the rest of us that we're all OK.

-

From: Evgeniy Polyakov
Date: Tuesday, January 30, 2007 - 2:56 am

This requires either mind-reading machine, or some feedback from Ulrich.
Last main from him about kevent was related to 25 release.

As far as I can see, only questionable parts are signal mask in
syscalls, but nature of kevent signal delivering does not require it,
since mask of pending signals is not updated if special flag is set,
and exceeded functionality (like hrtimers accessible through kevent
interface and as POSIX addon).

-- 
	Evgeniy Polyakov
-

From: Sunil Naidu
Date: Tuesday, January 30, 2007 - 1:16 am

I am hit with a compile error! Here is the info:-

  CC [M]  drivers/net/chelsio/cxgb2.o
  CC [M]  drivers/net/chelsio/espi.o
  CC [M]  drivers/net/chelsio/tp.o
  CC [M]  drivers/net/chelsio/pm3393.o
  CC [M]  drivers/net/chelsio/sge.o
drivers/net/chelsio/sge.c: In function 't1_interrupt':
drivers/net/chelsio/sge.c:1705: error: expected ')' before 'work_done'
drivers/net/chelsio/sge.c:1722: error: expected expression before '}' token
drivers/net/chelsio/sge.c:1697: warning: unused variable 'work_done'
drivers/net/chelsio/sge.c:1722: warning: no return statement in
function returning non-void
make[3]: *** [drivers/net/chelsio/sge.o] Error 1
make[2]: *** [drivers/net/chelsio] Error 2
make[1]: *** [drivers/net] Error 2
make: *** [drivers] Error 2


~Akula2
-

From: Andrew Morton
Date: Tuesday, January 30, 2007 - 1:32 am

On Tue, 30 Jan 2007 13:46:54 +0530

--- a/drivers/net/chelsio/sge.c~git-netdev-all-chelsio-fix
+++ a/drivers/net/chelsio/sge.c
@@ -1701,7 +1701,7 @@ irqreturn_t t1_interrupt(int irq, void *
 
 	writel(F_PL_INTR_SGE_DATA, adapter->regs + A_PL_CAUSE);
 
-	if (likely(responses_pending(adapter))
+	if (likely(responses_pending(adapter)))
 		work_done = process_responses(adapter, -1);
 	else
 		work_done = t1_slow_intr_handler(adapter);
_

-

From: Olivier Galibert
Date: Tuesday, January 30, 2007 - 2:06 am

Want me to update these?  And maybe the other mmconfig related ones if
I can find them.

  OG.
-

From: Andrew Morton
Date: Tuesday, January 30, 2007 - 2:26 am

On Tue, 30 Jan 2007 10:06:45 +0100

Thanks.  That depends upon which of Andi or Len merges first.

If Andi goes first, then Len has rather a bit of hackwork to do.

If Len goes first then things are probably simpler, but that ACPI codedrop
is very new and might have problems.  We wouldn't want to hold the x86
merge back because of it.

For now, I guess we sit back while Len and Andi sort out what they're going
to do.

Len, what was in that merge anyway?  Lots of renaming and shuffling things
around - the sorts of things which are safe as long as they compile OK.  But
was there much substantive material in there as well?

-

From: Olivier Galibert
Date: Tuesday, January 30, 2007 - 11:47 am

It seems heavy in general, but the intersection with mmconfig looks
rather limited:
- s/acpi_table_mcfg_config/acpi_mcfg_allocation/
- s/base_address/address/
- s/pci_segment_group_number/pci_segment/
- address is now 64 bits

The last point is both good and bad.  The i965 needs it (good), I
don't know if the mapping functions can handle actual 64bits addresses
(maybe bad), especially on i386.

  OG.

-

From: Len Brown
Date: Wednesday, January 31, 2007 - 12:57 am

The big thing was the new table manager.
Linux used to have multiple copies of the ACPI tables -- sometimes inconsistent.
Now, we use a single copy of each table.
Indeed, with the exception of the FADT -- where we need to convert multiple
versions into a single version, we map the tables directly where the BIOS
gives them to us and thus don't allocate any memory for them at all.

cheers,
-Len
-

From: Len Brown
Date: Wednesday, January 31, 2007 - 9:25 pm

I think chances are actually quite good we'll be able to proceed with
pushing the ACPI table re-write immediately upon 2.6.21 open.

Note that while it is sort of big text-wise, it isn't that complicated --
and failures in this type of code tend to be massively obvious boot failures.
Note also that  2.6.20-rc6-mm3 is not the maiden voyage to -mm for this code.
It has been there several times before as it has matured.

The table re-write broke the HP simulator -- but I think we can fix that quickly.
I don't know yet what broke the HP rx2600.
The sysfs branch is what broke the button and the Altix boot.
Sysfs may or may not go upstream at 2.6.21 open -- but that isn't
the code you are conflicting with here so that is moot.

cheers,
-

From: Maciej Rutecki
Date: Tuesday, January 30, 2007 - 3:18 pm

This is a multi-part message in MIME format.
--------------020505010903040805040904
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable


I have two problems. First suspend to disk.

After suspend to disk (before resume) I check time in bios, and it's
correct, but during resume, I have this message:

"Suspending console(s)"

system wait 20 seconds (or more) until finish resume. Also system clock
was slow about this 20 seconds.

Second problem, power button doesn't work. When I pressed it, I has this
error:

ACPI Error (evevent-0305): No installed handler for fixed event
[00000002] [20070126]


--=20
Maciej Rutecki <maciej.rutecki@gmail.com>

--------------020505010903040805040904
Content-Type: application/x-gzip;
 name="config-2.6.20-rc6-mm3.gz"
Content-Transfer-Encoding: base64
Content-Disposition: inline;
 ...
From: Andrew Morton
Date: Tuesday, January 30, 2007 - 3:27 pm

On Tue, 30 Jan 2007 23:18:42 +0100

OK, thanks.  That might be due to the time-management updates as well. 
I'll see if I can reproduce this.

If you're keen, you could test just 2.6.19-rc6+origin.patch+git-acpi.patch
from
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc6/2.6.20-rc6-mm...

That sounds like an acpi regression, yup.
-

From: Karsten Wiese
Date: Tuesday, January 30, 2007 - 5:55 pm

Similar weirdness here on rc6-mm2 and rc6-rt*:
resume from disk waits unduly long.
I played with bios clock setting after I wondered why susp/res
 wouldn't work overnight:
the longer the (faked/real) suspend to disk time,
the longer the (not seen on 2.6.18-rt) waiting past the incrementing
% display, before things are running again.
After turning time backwards in bios, console mouse handler gpm experiences
"interrupted system call". 


Some waiting times from rc6-rt6 from memory:

Config			| HZ			| NO_HZ + HRESTIMERS
cmos clock unchanged	| 2s			| 6s
cmos clock += 10min	| 			| 2 minutes
cmos clock += 2 month	| 20s			| > 4minutes, test interrupted


      Karsten
-

From: Ingo Molnar
Date: Wednesday, January 31, 2007 - 6:22 am

i'm wondering whether the jiffies update fix from Thomas fixes this bug 
for you.


i've seen something like this on -rt (and incorrectly attributed it to 
-rt) when running on a system which has a serial port and which has a 
kernel console on that serial port. What happens is that after resume 
(and straight after console suspend) every serial character printed 
takes /alot/ of time - and resume does print a number of kernel messages 
to the console. I didnt get any further in debugging this though, but 
disabling the serial console made the problem go away.

a possibly related thing: the serial code is sensitive to jiffies 
updates and timers, i saw that during early revisions of the dynticks 
code - but the specifics escape me.

the slowdown could also be something like the kernel somehow wrapping 
around jiffies and thus doing /alot/ of jiffy ticks? Or it could be a 
miscalculation in the amount of jiffies that need updating, resulting in 
a similar number of loops in the jiffy update code.

(i'll try to figure out this regression - but wanted to describe to you 
the known things so far, maybe you'll figure it out faster than me.)

	Ingo
-

From: Karsten Wiese
Date: Wednesday, January 31, 2007 - 7:25 am

Serial port console is off here and the jiffies update fix doesn't make
a noticeable difference.
I've just captured a dmesglog of 3 suspend/resume cycles with printk
timestamps. Config has NO_HZ and HRESTIMERS.
In the 1st 2 cycles, while in bios, I advanced the cmos clock by ~4minutes,
In the last cycle by ~2minutes.
After the 
	lapic resume on CPU#0
entries there are roughly proportional timestamp differences of ~60s, ~60s
and ~35s. Those equal the unexpected wait times.
Will look into what happens between "lapic resume on CPU#0" and
"pci 0000:00:00.0: EARLY resume" next.

      Karsten

[    0.000000] Linux version 2.6.20-rc6-rt6.dbg (ka@a64.localdomain) (gcc-Version 4.1.1 20070105 (Red Hat 4.1.1-51)) #3 PREEMPT Wed Jan 31 14:25:07 CET 2007
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] sanitize start
[    0.000000] sanitize end
[    0.000000] copy_e820_map() start: 0000000000000000 size: 000000000009fc00 end: 000000000009fc00 type: 1
[    0.000000] copy_e820_map() type is E820_RAM
[    0.000000] copy_e820_map() start: 000000000009fc00 size: 0000000000000400 end: 00000000000a0000 type: 2
[    0.000000] copy_e820_map() start: 00000000000f0000 size: 0000000000010000 end: 0000000000100000 type: 2
[    0.000000] copy_e820_map() start: 0000000000100000 size: 000000003fef0000 end: 000000003fff0000 type: 1
[    0.000000] copy_e820_map() type is E820_RAM
[    0.000000] copy_e820_map() start: 000000003fff0000 size: 0000000000008000 end: 000000003fff8000 type: 3
[    0.000000] copy_e820_map() start: 000000003fff8000 size: 0000000000008000 end: 0000000040000000 type: 4
[    0.000000] copy_e820_map() start: 00000000fec00000 size: 0000000000001000 end: 00000000fec01000 type: 2
[    0.000000] copy_e820_map() start: 00000000fee00000 size: 0000000000001000 end: 00000000fee01000 type: 2
[    0.000000] copy_e820_map() start: 00000000fff80000 size: 0000000000080000 end: 0000000100000000 type: 2
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
[    ...
From: Ingo Molnar
Date: Thursday, February 1, 2007 - 1:01 am

ok, this eliminates my serial port theory.

and this means i'm having trouble reproducing this problem locally. 
Maybe i tried it the wrong way: does it only occur with suspend-to-disk, 
or suspend-to-ram too? Does it need ACPI suspend-to-disk, or 
software-suspend?

	Ingo
-

From: Karsten Wiese
Date: Thursday, February 1, 2007 - 3:44 am

Haven't checked suspend-to-ram.
I use pm-hibernate or 
"echo reboot > /sys/power/disk; echo disk > /sys/power/state".
Thats software-suspend?
I think the wait is caused by an interrupt starting somewhere under
sysdev_resume(void).
possibly lapic timer interrupt? Will try to trace that.
2.6.20-rc6-rt6 .config attached.
Machine is AMD64 UP, VIA chipset desktop.

      Karsten

From: Karsten Wiese
Date: Thursday, March 1, 2007 - 4:11 am

Some evidence:
[root@a64 Desktop]# echo reboot > /sys/power/disk
[root@a64 Desktop]# cat /proc/interrupts; echo disk > /sys/power/state ; cat /proc/interrupts
<snip>
LOC:    2215504
<snip, cmos clock untouched>
LOC:    2216432
<snip>
[root@a64 Desktop]# cat /proc/interrupts; echo disk > /sys/power/state ; cat /proc/interrupts
<snip>
LOC:    2225752
<snip, cmos clock advanced by 1 month under biossettings>
LOC:    2238383
<snip>

      Karsten


-

From: Pavel Machek
Date: Friday, February 2, 2007 - 5:37 pm

? What is ACPI suspend-to-disk? There used to be S4bios *long* time
ago... these days everyone does swsusp.

(Granted, you can select "shutdown" and "platform" flavours...)
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Thomas Gleixner
Date: Thursday, February 1, 2007 - 6:03 am

The reworked highres/dyntick code made a thinko in the resume code
visible, which was magically working in the old queue.

On resume we add the slept time to xtime. This delta must be adjusted in
wall_to_monotonic as well. The update of jiffies64 is bogus and a
leftover of the code which was taken from arch/i386/kernel/time.c.

On suspend the current state of xtime, wall_to_monotonic and jiffies is
frozen. After resume we need to add the slept time to xtime, but we need
to subtract it from wall_to_monotonic, so the monotonic time is resuming
from exactly the point where it was suspended. jiffies are restarting
from the same point as well.

This solves the resume waittime observed by Karsten Wiese.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Index: linux-2.6.20-rc6-mm/kernel/timer.c
===================================================================
--- linux-2.6.20-rc6-mm.orig/kernel/timer.c
+++ linux-2.6.20-rc6-mm/kernel/timer.c
@@ -985,8 +985,9 @@ static int timekeeping_resume(struct sys
 
 	if (now && (now > timekeeping_suspend_time)) {
 		unsigned long sleep_length = now - timekeeping_suspend_time;
+
 		xtime.tv_sec += sleep_length;
-		jiffies_64 += (u64)sleep_length * HZ;
+		wall_to_monotonic.tv_sec -= sleep_length;
 	}
 	/* re-base the last cycle value */
 	clock->cycle_last = clocksource_read(clock);
@@ -994,7 +995,7 @@ static int timekeeping_resume(struct sys
 	timekeeping_suspended = 0;
 	write_sequnlock_irqrestore(&xtime_lock, flags);
 
-	clockevents_notify(CLOCK_EVT_NOTIFY_RESUME, NULL);
+	touch_softlockup_watchdog();
 	/* Resume hrtimers */
 	clock_was_set();
 


-

From: Maciej Rutecki
Date: Wednesday, January 31, 2007 - 4:54 am

This is a multi-part message in MIME format.
--------------090306000001060400040904
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable


I try 2.6.19-rc6 witch origin.patch and git-acpi.patch. This patch
didn't add correctly (see patch_output.txt). Also I have compiling error:=


AS      arch/i386/kernel/entry.o
  CC      arch/i386/kernel/traps.o
  CC      arch/i386/kernel/irq.o
  CC      arch/i386/kernel/ptrace.o
  CC      arch/i386/kernel/time.o
  CC      arch/i386/kernel/ioport.o
  CC      arch/i386/kernel/ldt.o
  CC      arch/i386/kernel/setup.o
In file included from include/acpi/acpi.h:62,
                 from include/linux/acpi.h:37,
                 from arch/i386/kernel/setup.c:31:
include/acpi/acpixf.h:100: warning: 'struct acpi_pointer' declared
inside parameter list
include/acpi/acpixf.h:100: warning: its scope is only this definition or
declaration, which is probably not what you want
include/acpi/acpixf.h:115: error: expected ')' before 'table_type'
make[2]: *** [arch/i386/kernel/setup.o] B=C5=82=C4=85d 1
make[1]: *** [arch/i386/kernel] B=C5=82=C4=85d 2
make[1]: Opuszczenie katalogu `/usr/src/linux-rc'
make: *** [debian/stamp-build-kernel] B=C5=82=C4=85d 2

"B=C5=82=C4=85d"=3D"Error" (in polish).

--=20
Maciej Rutecki <maciej.rutecki@gmail.com>

--------------090306000001060400040904
Content-Type: application/x-gzip;
 name="patch_output.txt.gz"
Content-Transfer-Encoding: base64
Content-Disposition: inline;
 ...
From: Len Brown
Date: Wednesday, January 31, 2007 - 9:10 pm

Note that you can always get the latest ACPI patch and drop it directly onto Linus' tree here:
http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/patches/test/2.6.20/

-

From: Andrew Morton
Date: Wednesday, January 31, 2007 - 9:17 pm

I'm suspecting that Maciej misapplied the -mm patch somehow:

box:/usr/src/foo> grep acpi_pointer include/acpi/acpixf.h
box:/usr/src/foo> 
-

From: Tilman Schmidt
Date: Tuesday, January 30, 2007 - 6:16 pm

Same here, minus the message. (Or perhaps I just don't know where to look=
=2E)
Problem also exists in 2.6.20-rc6-mm2. With 2.6.20-rc6-git1 the power
button of this machine works fine.

--=20
Tilman Schmidt                          E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Unge=C3=B6ffnet mindestens haltbar bis: (siehe R=C3=BCckseite)

From: Andrew Morton
Date: Tuesday, January 30, 2007 - 6:25 pm

On Wed, 31 Jan 2007 02:16:43 +0100

That's significant - in your case at least the 2.6.20-rc6-mm3 ACPI update
isn't the cause.

-

From: Tilman Schmidt
Date: Wednesday, January 31, 2007 - 4:38 am

Is there anything specific I should test, or Big Bisect time?

--=20
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)

From: Alexey Starikovskiy
Date: Wednesday, January 31, 2007 - 5:29 am

This patch should fix the issue...

Regards,
    Alex.
From: Maciej Rutecki
Date: Wednesday, January 31, 2007 - 9:02 am

Yes, after add this patch power button works fine. Thanks

--=20
Maciej Rutecki <maciej.rutecki@gmail.com>

From: Tilman Schmidt
Date: Wednesday, January 31, 2007 - 11:28 am

Am 31.01.2007 13:29 schrieb Alexey Starikovskiy:

It does. Power button works again.

Thanks,


--=20
Tilman Schmidt                          E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)

From: Mattia Dongili
Date: Wednesday, January 31, 2007 - 2:52 pm

I jumped from rc2-mm1 to rc6-mm3 and tried dynticks for the first time:

...
Time: acpi_pm clocksource has been installed.
Clocksource tsc unstable (delta = -275154141 ns)
BUG: soft lockup detected on CPU#1!
 [<c0104dac>] show_trace_log_lvl+0x1a/0x2f
 [<c010540b>] show_trace+0x12/0x14
 [<c010548f>] dump_stack+0x16/0x18
 [<c0146e50>] softlockup_tick+0xa7/0xb6
 [<c01286c0>] run_local_timers+0x12/0x14
 [<c0128a61>] update_process_times+0x3e/0x63
 [<c01374c5>] tick_nohz_handler+0x7d/0xe3
 [<c01137c2>] smp_apic_timer_interrupt+0x71/0x83
 [<c01048f4>] apic_timer_interrupt+0x28/0x30
 [<c013709e>] tick_broadcast_oneshot_control+0x12/0xc8
 [<c0136a58>] tick_notify+0x1cd/0x241
 [<c012bd4e>] notifier_call_chain+0x2b/0x55
 [<c012bde2>] __raw_notifier_call_chain+0x19/0x1e
 [<c012be01>] raw_notifier_call_chain+0x1a/0x1c
 [<c01364ea>] clockevents_do_notify+0x11/0x13
 [<c013672d>] clockevents_notify+0x1c/0x53
 [<f8f7b5cb>] acpi_state_timer_broadcast+0x2e/0x31 [processor]
 [<f8f7c12f>] acpi_processor_idle+0x276/0x40b [processor]
 [<c0102435>] cpu_idle+0xad/0xd3
 [<c0112975>] start_secondary+0x32b/0x333
 [<00000000>] run_init_process+0x3fefed10/0x19
 =======================

Full dmesg and config:
http://oioio.altervista.org/linux/nohz_soft-lockup.dmesg
http://oioio.altervista.org/linux/config-2.6.20-rc6-mm3-1

As a side note the process becomes slower and slower as it proceeds,
it's definitely noticeable during my iptables rules setup (nothing that
complex, just default policies and subnet/lan accept rules).
Building with NO_HZ=n right now.

-- 
-

From: Mattia Dongili
Date: Wednesday, January 31, 2007 - 4:21 pm

yes, slowness is gone. Any useful information I can provide?

-- 
-

From: Ingo Molnar
Date: Thursday, February 1, 2007 - 12:04 pm

thanks for reporting this - i'll try your config. There's one fix ontop 
of -mm3 - see below - but i'm not sure it's related, it addresses resume 
problems.

	Ingo

---
 kernel/timer.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux/kernel/timer.c
===================================================================
--- linux.orig/kernel/timer.c
+++ linux/kernel/timer.c
@@ -1120,8 +1120,9 @@ static int timekeeping_resume(struct sys
 
 	if (now && (now > timekeeping_suspend_time)) {
 		unsigned long sleep_length = now - timekeeping_suspend_time;
+
 		xtime.tv_sec += sleep_length;
-		jiffies_64 += (u64)sleep_length * HZ;
+		wall_to_monotonic.tv_sec -= sleep_length;
 	}
 	/* re-base the last cycle value */
 	clock->cycle_last = clocksource_read(clock);
@@ -1130,7 +1131,7 @@ static int timekeeping_resume(struct sys
 	warp_check_clock_was_changed();
 	write_sequnlock_irqrestore(&xtime_lock, flags);
 
-	clockevents_notify(CLOCK_EVT_NOTIFY_RESUME, NULL);
+	touch_softlockup_watchdog();
 	/* Resume hrtimers */
 	clock_was_set();
 
-

From: Thomas Gleixner
Date: Thursday, February 1, 2007 - 12:36 pm

Can you please try with CONFIG_ACPI_PROCESSOR=y instead of =m ? This
should make the slowness go away too.

I think I know what happens. I try to reproduce with
CONFIG_ACPI_PROCESSOR=m.

	tglx


-

From: Thomas Gleixner
Date: Thursday, February 1, 2007 - 1:01 pm

Hmm. Not reproducible on my jinxed Sony. 

It might be helpful if you could try with your original config again.
Please enable printk timestamps and SysRq. Once the slowness kicks in
please issue a SysRq-Q, so we can look at the internal state of the tick
code.

Thanks,

	tglx


-

From: Mattia Dongili
Date: Thursday, February 1, 2007 - 2:11 pm

dmesg is below. I need to say that the printk times are bogus wrt the
actual time passing and at one point I was sick waiting and killed all
tasks. Ah, I have Ingo's resume-fix patch applied here.

I also played a little with (date; sleep 1; date):
Thu Feb  1 21:56:16 CET 2007
Thu Feb  1 21:56:17 CET 2007

Thu Feb  1 21:57:01 CET 2007
Thu Feb  1 21:57:05 CET 2007

Thu Feb  1 21:57:11 CET 2007
Thu Feb  1 21:57:15 CET 2007

Thu Feb  1 21:57:21 CET 2007
Thu Feb  1 21:57:25 CET 2007

dmesg:

[    0.000000] Linux version 2.6.20-rc6-mm3-1 (mattia@tadamune) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #7 SMP Thu Feb 1 21:44:52 CET 2007
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] sanitize start
[    0.000000] sanitize end
[    0.000000] copy_e820_map() start: 0000000000000000 size: 000000000009f800 end: 000000000009f800 type: 1
[    0.000000] copy_e820_map() type is E820_RAM
[    0.000000] copy_e820_map() start: 000000000009f800 size: 0000000000000800 end: 00000000000a0000 type: 2
[    0.000000] copy_e820_map() start: 00000000000dc000 size: 0000000000024000 end: 0000000000100000 type: 2
[    0.000000] copy_e820_map() start: 0000000000100000 size: 000000003fd70000 end: 000000003fe70000 type: 1
[    0.000000] copy_e820_map() type is E820_RAM
[    0.000000] copy_e820_map() start: 000000003fe70000 size: 0000000000090000 end: 000000003ff00000 type: 4
[    0.000000] copy_e820_map() start: 000000003ff00000 size: 0000000000100000 end: 0000000040000000 type: 2
[    0.000000] copy_e820_map() start: 00000000e0000000 size: 0000000010000000 end: 00000000f0000000 type: 2
[    0.000000] copy_e820_map() start: 00000000fec00000 size: 0000000000010000 end: 00000000fec10000 type: 2
[    0.000000] copy_e820_map() start: 00000000fed14000 size: 0000000000006000 end: 00000000fed1a000 type: 2
[    0.000000] copy_e820_map() start: 00000000fed1c000 size: 0000000000074000 end: 00000000fed90000 type: 2
[    0.000000] copy_e820_map() start: 00000000fee00000 size: ...
From: Thomas Gleixner
Date: Thursday, February 1, 2007 - 3:33 pm

Mattia,


Ok, does not affect your problem. 


Sigh. This APIC calibration madness seems to be spreading (especially on



delta is 62 jiffies = 62 * 4ms which is consistent with the idle_expires



And the broadcast event is set for the next CPU#1 event, but the expiry
time is far away from the idle_expires time above.

I'm a bit puzzled and too tired to spot the bug right now.

May I ask you for another test ? Please turn on high resolution timers
and check, if the same strange behaviour is happening.

Thanks. 

	tglx


-

From: Mattia Dongili
Date: Friday, February 2, 2007 - 12:18 pm

Cc-ing netdev and netfilter-devel, the beginning of the thread is here
http://lkml.org/lkml/2007/1/31/306


Yep, here we go again. Still seeing long stalls but no negative expires
offset.
Actually one more test I did is disabling my iptables script and the
boot process went fine. The script is just:

  #!/bin/sh
  iptables -F INPUT
  iptables -F FORWARD
  iptables -F OUTPUT
  iptables -P INPUT DROP
  iptables -P FORWARD ACCEPT
  iptables -A INPUT -i lo -j ACCEPT
  iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
  iptables -A INPUT -p tcp --dport ssh -j ACCEPT
  # LAN 
  iptables -I INPUT -s 10.0.0.0/8 -j ACCEPT
  # LAN UML
  iptables -I INPUT -s 172.20.0.0/16 -j ACCEPT
  echo "iptables: MASQUERADING for virtual machines"
  iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
  iptables -t nat -A POSTROUTING -o eth2 -j MASQUERADE
  sysctl -w net.ipv4.ip_forward=1

and executing it from a shell once the boot process is done doesn't
generate all that strangeness/slowness...

Dmesg with iptables script enabled:

[    0.000000] Linux version 2.6.20-rc6-mm3-1 (mattia@tadamune) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #8 SMP Fri Feb 2 10:26:07 CET 2007
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] sanitize start
[    0.000000] sanitize end
[    0.000000] copy_e820_map() start: 0000000000000000 size: 000000000009f800 end: 000000000009f800 type: 1
[    0.000000] copy_e820_map() type is E820_RAM
[    0.000000] copy_e820_map() start: 000000000009f800 size: 0000000000000800 end: 00000000000a0000 type: 2
[    0.000000] copy_e820_map() start: 00000000000dc000 size: 0000000000024000 end: 0000000000100000 type: 2
[    0.000000] copy_e820_map() start: 0000000000100000 size: 000000003fd70000 end: 000000003fe70000 type: 1
[    0.000000] copy_e820_map() type is E820_RAM
[    0.000000] copy_e820_map() start: 000000003fe70000 size: 0000000000090000 end: 000000003ff00000 type: 4
[    0.000000] copy_e820_map() start: 000000003ff00000 size: ...
From: Thomas Gleixner
Date: Friday, February 2, 2007 - 1:27 pm

Mattia,

I have it halfways reproducible now and I'm working to find the root
cause. Thanks for providing the info.

	tglx


-

From: Mattia Dongili
Date: Friday, February 2, 2007 - 1:43 pm

Great, I'm obviously available to test any patch :)

-- 
-

From: Ingo Molnar
Date: Tuesday, February 6, 2007 - 9:48 am

Mattia,


Could you try the patch below? The RCU serialization code (a rare call 
but can be common in some types of setups) has a nasty implicit 
dependency on the HZ tick - which until now was a hidden wart but became 
an explicit bug under dynticks. Maybe this is what is slowing down your 
box.

	Ingo

------------------------->
Subject: [patch] dynticks: make sure synchronize_rcu() completes
From: Ingo Molnar <mingo@elte.hu>

synchronize_rcu() has a nasty implicit dependency on the HZ tick: it 
relies on another CPU finishing all RCU work so that this CPU can finish 
its RCU work too - in IRQ context. But wait_for_completion() goes to 
sleep indefinitely on dynticks and there might be no other IRQs to this 
CPU for a long time.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/rcupdate.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: linux/kernel/rcupdate.c
===================================================================
--- linux.orig/kernel/rcupdate.c
+++ linux/kernel/rcupdate.c
@@ -85,8 +85,13 @@ void synchronize_rcu(void)
 	/* Will wake me after RCU finished */
 	call_rcu(&rcu.head, wakeme_after_rcu);
 
-	/* Wait for it */
-	wait_for_completion(&rcu.completion);
+	/*
+	 * Wait for it. Note: on dynticks RCU completion needs to be
+	 * polled frequently, to make sure we finish work. If this CPU
+	 * goes idle then another CPU cannot finish this CPU's work.
+	 */
+	while (wait_for_completion_timeout(&rcu.completion, HZ/100 ? : 1) == 0)
+		/* nothing */;
 }
 
 static void rcu_barrier_callback(struct rcu_head *notused)
-

From: Mattia Dongili
Date: Tuesday, February 6, 2007 - 12:28 pm

No, not this. Anyway the last patch Thomas forwarded does fix the
problem.

By the way, I have all the patches I received stacked up, if you want me
to test some different combination, just ask.

Thanks
-- 
-

From: Tilman Schmidt
Date: Tuesday, February 6, 2007 - 4:12 pm

I have the same problem (huge delay when loading iptables) with

Which one would that be? I might try it for comparison.

Thanks,
Tilman

--=20
Tilman Schmidt                          E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)

From: Thomas Gleixner
Date: Tuesday, February 6, 2007 - 4:17 pm

Find the combined patch of all fixlets on top of -mm3 below.

	tglx

Index: linux-2.6.20/kernel/timer.c
===================================================================
--- linux-2.6.20.orig/kernel/timer.c
+++ linux-2.6.20/kernel/timer.c
@@ -985,8 +985,9 @@ static int timekeeping_resume(struct sys
 
 	if (now && (now > timekeeping_suspend_time)) {
 		unsigned long sleep_length = now - timekeeping_suspend_time;
+
 		xtime.tv_sec += sleep_length;
-		jiffies_64 += (u64)sleep_length * HZ;
+		wall_to_monotonic.tv_sec -= sleep_length;
 	}
 	/* re-base the last cycle value */
 	clock->cycle_last = clocksource_read(clock);
@@ -994,7 +995,7 @@ static int timekeeping_resume(struct sys
 	timekeeping_suspended = 0;
 	write_sequnlock_irqrestore(&xtime_lock, flags);
 
-	clockevents_notify(CLOCK_EVT_NOTIFY_RESUME, NULL);
+	touch_softlockup_watchdog();
 	/* Resume hrtimers */
 	clock_was_set();
 
Index: linux-2.6.20/kernel/time/clockevents.c
===================================================================
--- linux-2.6.20.orig/kernel/time/clockevents.c
+++ linux-2.6.20/kernel/time/clockevents.c
@@ -42,8 +42,8 @@ unsigned long clockevent_delta2ns(unsign
 	u64 clc = ((u64) latch << evt->shift);
 
 	do_div(clc, evt->mult);
-	if (clc < KTIME_MONOTONIC_RES.tv64)
-		clc = KTIME_MONOTONIC_RES.tv64;
+	if (clc < 1000)
+		clc = 1000;
 	if (clc > LONG_MAX)
 		clc = LONG_MAX;
 
@@ -72,18 +72,22 @@ void clockevents_set_mode(struct clock_e
  *
  * Returns 0 on success, -ETIME when the event is in the past.
  */
-int clockevents_program_event(struct clock_event_device *dev, ktime_t expires)
+int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
+			      ktime_t now)
 {
 	unsigned long long clc;
 	int64_t delta;
 
-	delta = ktime_to_ns(ktime_sub(expires, ktime_get()));
+	delta = ktime_to_ns(ktime_sub(expires, now));
 
 	if (delta <= 0)
 		return -ETIME;
 
 	dev->next_event = expires;
 
+	if (dev->mode == CLOCK_EVT_MODE_SHUTDOWN)
+		return ...
From: Andrew Morton
Date: Tuesday, February 6, 2007 - 6:01 pm

On Wed, 07 Feb 2007 00:17:33 +0100

err, I don't have most of this.

I just uploaded the crappile-of-the-moment to
   ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-02-06-16-59.tar.gz
-

From: Ingo Molnar
Date: Wednesday, February 7, 2007 - 12:33 pm

hm:

 ERROR 404: Not Found.

pls. do:

ssh master.kernel.org chmod a+r /pub/linux/kernel/people/akpm/mm/broken-out-2007-02-06-16-59.tar.gz

	Ingo
-

From: Mattia Dongili
Date: Thursday, February 1, 2007 - 2:37 pm

BTW: booting with clocksource=pmtmr make it proceed a little better but
I still experience stalls:


[    0.000000] Linux version 2.6.20-rc6-mm3-1 (mattia@tadamune) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #7 SMP Thu Feb 1 21:44:52 CET 2007
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] sanitize start
[    0.000000] sanitize end
[    0.000000] copy_e820_map() start: 0000000000000000 size: 000000000009f800 end: 000000000009f800 type: 1
[    0.000000] copy_e820_map() type is E820_RAM
[    0.000000] copy_e820_map() start: 000000000009f800 size: 0000000000000800 end: 00000000000a0000 type: 2
[    0.000000] copy_e820_map() start: 00000000000dc000 size: 0000000000024000 end: 0000000000100000 type: 2
[    0.000000] copy_e820_map() start: 0000000000100000 size: 000000003fd70000 end: 000000003fe70000 type: 1
[    0.000000] copy_e820_map() type is E820_RAM
[    0.000000] copy_e820_map() start: 000000003fe70000 size: 0000000000090000 end: 000000003ff00000 type: 4
[    0.000000] copy_e820_map() start: 000000003ff00000 size: 0000000000100000 end: 0000000040000000 type: 2
[    0.000000] copy_e820_map() start: 00000000e0000000 size: 0000000010000000 end: 00000000f0000000 type: 2
[    0.000000] copy_e820_map() start: 00000000fec00000 size: 0000000000010000 end: 00000000fec10000 type: 2
[    0.000000] copy_e820_map() start: 00000000fed14000 size: 0000000000006000 end: 00000000fed1a000 type: 2
[    0.000000] copy_e820_map() start: 00000000fed1c000 size: 0000000000074000 end: 00000000fed90000 type: 2
[    0.000000] copy_e820_map() start: 00000000fee00000 size: 0000000000001000 end: 00000000fee01000 type: 2
[    0.000000] copy_e820_map() start: 00000000ff000000 size: 0000000001000000 end: 0000000100000000 type: 2
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
[    0.000000]  BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: ...
From: Mattia Dongili
Date: Thursday, February 1, 2007 - 1:24 pm

not much luck. Actually the general slowness is gone but at netfilter
registration the the boot almost stops.

I'll provide the info you requested shortly.
-- 
-

From: Christoph Lameter
Date: Wednesday, January 31, 2007 - 5:14 pm

MD hung again as before so I compiled a kernel 
without it. Next XFS started hanging during bootup.

Some traces of processes hung but I do not have a clue as to what is 
wrong here...:

Call Trace:
 [<a00000010074c1b0>] schedule+0x1bf0/0x1ec0
                                sp=e00000301560fac0 bsp=e000003015608fc8
 [<a0000001003ba350>] xfs_buf_iorequest+0x130/0x820
                                sp=e00000301560fbd0 bsp=e000003015608f58
 [<a0000001003c5b00>] xfs_bdstrat_cb+0x60/0x100
                                sp=e00000301560fc00 bsp=e000003015608f38
 [<a0000001003b2ba0>] xfs_bwrite+0xe0/0x1e0
                                sp=e00000301560fc00 bsp=e000003015608f00
 [<a0000001003a3980>] xfs_syncsub+0x2c0/0x520
                                sp=e00000301560fc00 bsp=e000003015608eb0
 [<a0000001003a3d30>] xfs_sync+0x70/0xa0
                                sp=e00000301560fc00 bsp=e000003015608e88
 [<a0000001003cb400>] vfs_sync+0xa0/0xc0
                                sp=e00000301560fc00 bsp=e000003015608e58
 [<a0000001003c8910>] xfs_fs_write_super+0x70/0xa0
                                sp=e00000301560fc00 bsp=e000003015608e38
 [<a00000010016d490>] sync_supers+0x150/0x260
                                sp=e00000301560fc00 bsp=e000003015608e08
 [<a000000100115820>] wb_kupdate+0x60/0x280
                                sp=e00000301560fc00 bsp=e000003015608dc8
 [<a000000100116570>] pdflush+0x330/0x4e0
                                sp=e00000301560fc50 bsp=e000003015608d90
 [<a0000001000d2ac0>] kthread+0x220/0x2a0
                                sp=e00000301560fd50 bsp=e000003015608d48
 [<a000000100010a50>] kernel_thread_helper+0xd0/0x100
                                sp=e00000301560fe30 bsp=e000003015608d20
 [<a000000100009140>] start_kernel_thread+0x20/0x40
                                sp=e00000301560fe30 bsp=e000003015608d20
                                                                                             
Call Trace:
 [<a00000010074c1b0>] ...
From: Andrew Morton
Date: Wednesday, January 31, 2007 - 5:24 pm

On Wed, 31 Jan 2007 16:14:10 -0800 (PST)

ow.  Please don't make me drop git-block-and-lots-of-other-things again.

-

From: Christoph Lameter
Date: Wednesday, January 31, 2007 - 5:27 pm

Yes, 2.6.20-rc6-mm2 was okay. Sorry.
-

From: Andrew Morton
Date: Wednesday, January 31, 2007 - 5:36 pm

On Wed, 31 Jan 2007 16:27:16 -0800 (PST)

OK, thanks.

Actually, we might not have lost an IO: it could be that we're simply
missing an unplug.  Are you able to unblock things by forcing some other IO
against that queue?  Say, do a read from /dev/sda?

-

From: Christoph Lameter
Date: Wednesday, January 31, 2007 - 5:38 pm

The system does not come up to a prompt. The traces were taken via NMI 
during bootup (shortly after udev came up).

-

From: David Chinner
Date: Wednesday, January 31, 2007 - 11:20 pm

Could be - I can't be certain but I think we've got one thread
waiting for a buffer to be unpinned before it is written, and
the other thread waiting for log I/O to complete. The first thread
won't unplug the device, and the log I/o is async so it won't either.

What are the new unplugging rules introduced by the git-block
patch? How do they differ from the existing rules?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-

From: Andrew Morton
Date: Thursday, February 1, 2007 - 12:12 am

Pretty simple: you read the largely-useless changelog then call the bravely
uncommented blk_plug_current() when you're about to submit some IO and you
call the audaciously uncommented blk_unplug_current() when you've finished
and you're ready to let it rip.

But usually none of that is necessary, because io_schedule() does all the
work for you.

err, this might help.

--- a/fs/xfs/linux-2.6/xfs_buf.c~git-block-xfs-fix
+++ a/fs/xfs/linux-2.6/xfs_buf.c
@@ -979,7 +979,7 @@ xfs_buf_wait_unpin(
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		if (atomic_read(&bp->b_pin_count) == 0)
 			break;
-		schedule();
+		io_schedule();
 	}
 	remove_wait_queue(&bp->b_waiters, &wait);
 	set_current_state(TASK_RUNNING);
_

-

From: Christoph Lameter
Date: Thursday, February 1, 2007 - 12:01 pm

Well okay boot progresses further (maybe only on this boot) but system is 
still hung.

Traces (this was a backtrace of all processes on the system. I removed 
the irrelevant ones):

Delaying for 5 seconds...
All OS INIT slaves have reached rendezvous
Processes interrupted by INIT - 0 (cpu 0 task 0xa000000100b24000) 0 (cpu 1 
task 0xe00000b003bd8000) 0 (cpu 2 task 0xe000023c38248000) 0 (cpu 3 task 
0xe00000b003d00000) 0 (cpu 4 task 0xe000023c38258000) 0 (cpu 5 task 
0xe00000b003d10000) 0 (cpu 6 task 0xe000023c38268000) 0 (cpu 7 task 
0xe00000b003d20000) 0 (cpu 8 task 0xe000023c382e8000) 0 (cpu 9 task 
0xe00000b003d30000) 0 (cpu 10 task 0xe000023c38380000) 0 (cpu 11 task 
0xe00000b003d40000)


Backtrace of pid 1 (init)

Call Trace:
 [<a000000100799bb0>] schedule+0x1bf0/0x1ec0
                                sp=e00000307bd178c0 bsp=e00000307bd11828
 [<a000000100797b20>] __down+0x240/0x280
                                sp=e00000307bd179d0 bsp=e00000307bd117e8
 [<a0000001003b8f10>] xfs_buf_iowait+0x50/0x80
                                sp=e00000307bd17a00 bsp=e00000307bd117c8
 [<a0000001003bb120>] xfs_buf_iostart+0x1a0/0x1c0
                                sp=e00000307bd17a00 bsp=e00000307bd117a0
 [<a0000001003bc140>] xfs_buf_read_flags+0xe0/0x160
                                sp=e00000307bd17a00 bsp=e00000307bd11768
 [<a00000010039e490>] xfs_trans_read_buf+0x50/0x6a0
                                sp=e00000307bd17a00 bsp=e00000307bd11710
 [<a0000001003420a0>] xfs_btree_read_bufl+0xe0/0x120
                                sp=e00000307bd17a00 bsp=e00000307bd116d0
 [<a000000100334de0>] xfs_bmap_read_extents+0x2c0/0x7e0
                                sp=e00000307bd17a10 bsp=e00000307bd11658
 [<a000000100376820>] xfs_iread_extents+0x160/0x1c0
                                sp=e00000307bd17a20 bsp=e00000307bd11618
 [<a000000100330b90>] xfs_bmapi+0x430/0x33a0
                                sp=e00000307bd17a20 bsp=e00000307bd114c8
 [<a00000010037d3e0>] ...
From: Jens Axboe
Date: Thursday, February 1, 2007 - 12:18 pm

That down() probably wants a replug to precede it. Probably something
like:

        if (atomic_read(&bp->b_io_remaining))
                blk_replug_current_nested();

for xfs_buf_wait_unpin() and xfs_buf_lock(). Does this fix it?

diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c
index f2bdf8b..1ef226e 100644
--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@@ -909,6 +909,8 @@ xfs_buf_lock(
 	xfs_buf_t		*bp)
 {
 	XB_TRACE(bp, "lock", 0);
+	if (atomic_read(&bp->b_io_remaining))
+		blk_replug_current_nested();
 	down(&bp->b_sema);
 	XB_SET_OWNER(bp);
 	XB_TRACE(bp, "locked", 0);
@@ -979,7 +981,7 @@ xfs_buf_wait_unpin(
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		if (atomic_read(&bp->b_pin_count) == 0)
 			break;
-		schedule();
+		io_schedule();
 	}
 	remove_wait_queue(&bp->b_waiters, &wait);
 	set_current_state(TASK_RUNNING);
@@ -1291,6 +1293,8 @@ xfs_buf_iowait(
 	xfs_buf_t		*bp)
 {
 	XB_TRACE(bp, "iowait", 0);
+	if (atomic_read(&bp->b_io_remaining))
+		blk_replug_current_nested();
 	down(&bp->b_iodonesema);
 	XB_TRACE(bp, "iowaited", (long)bp->b_error);
 	return bp->b_error;
@@ -1682,6 +1686,7 @@ xfsbufd(
 	xfs_buf_t		*bp, *n;
 	struct list_head	*dwq = &target->bt_delwrite_queue;
 	spinlock_t		*dwlk = &target->bt_delwrite_lock;
+	int			count;
 
 	current->flags |= PF_MEMALLOC;
 
@@ -1697,6 +1702,7 @@ xfsbufd(
 		schedule_timeout_interruptible(
 			xfs_buf_timer_centisecs * msecs_to_jiffies(10));
 
+		count = 0;
 		age = xfs_buf_age_centisecs * msecs_to_jiffies(10);
 		spin_lock(dwlk);
 		list_for_each_entry_safe(bp, n, dwq, b_list) {
@@ -1716,6 +1722,7 @@ xfsbufd(
 						 _XBF_RUN_QUEUES);
 				bp->b_flags |= XBF_WRITE;
 				list_move_tail(&bp->b_list, &tmp);
+				count++;
 			}
 		}
 		spin_unlock(dwlk);
@@ -1730,6 +1737,8 @@ xfsbufd(
 
 		if (as_list_len > 0)
 			purge_addresses();
+		if (count)
+			blk_replug_current_nested();
 
 		clear_bit(XBT_FORCE_FLUSH, &target->bt_flags);
 	} while ...
From: Christoph Lameter
Date: Thursday, February 1, 2007 - 1:18 pm

No it still hangs consistently. This time at an earlier spot.


All OS INIT slaves have reached rendezvous
Processes interrupted by INIT - 0 (cpu 0 task 0xa000000100b24000) 0 (cpu 1 
task 0xe00000b003bd8000) 0 (cpu 2 task 0xe000023c38248000) 0 (cpu 3 task 
0xe00000b003d00000) 0 (cpu 4 task 0xe000023c38258000) 0 (cpu 5 task 
0xe00000b003d10000) 0 (cpu 6 task 0xe000023c38268000) 0 (cpu 7 task 
0xe00000b003d20000) 0 (cpu 8 task 0xe000023c382e8000) 0 (cpu 9 task 
0xe00000b003d30000) 0 (cpu 10 task 0xe000023c38380000) 0 (cpu 11 task 
0xe00000b003d40000)


Backtrace of pid 223 (pdflush)

Call Trace:
 [<a000000100799ed0>] schedule+0x1bf0/0x1ec0
                                sp=e0000030156879d0 bsp=e0000030156811a8
 [<a0000001000db490>] synchronize_qrcu+0x170/0x1e0
                                sp=e000003015687ae0 bsp=e000003015681170
 [<a0000001003f6bc0>] __make_request+0x160/0x880
                                sp=e000003015687b10 bsp=e000003015681130
 [<a0000001003f17c0>] generic_make_request+0x4a0/0x520
                                sp=e000003015687b30 bsp=e0000030156810f8
 [<a0000001003f78f0>] submit_bio+0x2f0/0x320
                                sp=e000003015687b50 bsp=e0000030156810b0
 [<a0000001003bab20>] xfs_buf_iorequest+0x740/0x820
                                sp=e000003015687b70 bsp=e000003015681040
 [<a0000001003869d0>] xlog_bdstrat_cb+0x50/0xe0
                                sp=e000003015687ba0 bsp=e000003015681020
 [<a000000100384150>] xlog_state_release_iclog+0x770/0xcc0
                                sp=e000003015687ba0 bsp=e000003015680fc0
 [<a000000100384860>] xlog_state_sync_all+0x1c0/0x460
                                sp=e000003015687ba0 bsp=e000003015680f60
 [<a000000100385010>] _xfs_log_force+0xd0/0x5c0
                                sp=e000003015687bd0 bsp=e000003015680f00
 [<a0000001003a3720>] xfs_syncsub+0x40/0x520
                                sp=e000003015687c00 bsp=e000003015680eb0
 [<a0000001003a3d50>] xfs_sync+0x70/0xa0
        ...
From: Jens Axboe
Date: Thursday, February 1, 2007 - 1:26 pm

That looks like barriers, could you try with those disabled? Sorry for
making you go through this, I can't debug and fix it myself before
monday.

-- 
Jens Axboe

-

From: Christoph Lameter
Date: Thursday, February 1, 2007 - 4:02 pm

Disabling barriers + your patch works. Modified /etc/fstab and added a 
nobarrier option to the root filesystem. If I take your patch out then the 
systems hangs again.



-

From: Jens Axboe
Date: Monday, February 5, 2007 - 5:02 am

I can't reproduce this. Can you see if this debug patch catches
anything? You need to enable barriers again.

diff --git a/kernel/sched.c b/kernel/sched.c
index e209901..00c2ab9 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3434,6 +3434,10 @@ asmlinkage void __sched schedule(void)
 			print_irqtrace_events(current);
 		dump_stack();
 	}
+	if (unlikely(current->io_context && current->io_context->plugged)) {
+		printk(KERN_ERR "%s: schedules plugged\n", current->comm);
+		print_irqtrace_events(current);
+	}
 	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
 
 need_resched:

-- 
Jens Axboe

-

From: Jens Axboe
Date: Monday, February 5, 2007 - 5:17 am

Nevermind, that was too aggressive. I'll come up with a better debug
patch.

-- 
Jens Axboe

-

From: Jens Axboe
Date: Monday, February 5, 2007 - 5:56 am

Alright, try this one. It should show whether this is a missing unplug
or not (which I think it is, hence the stall in qrcu sync). A process
may legitimately block with plugged requests, that sometimes happens for
bio/rq allocation etc. In that case we do want to unplug anyway though,
as I don't think we should hold requests plugged even for a merge if we
are going to block. And the below means we can move this out of
io_schedule() and eliminate the io_schedule() requirement that I'm not
too fond of, as I don't want to reintroduce all the problems we had with
missing unplugs in the 2.4 kernels.

But for now this is just a debug test, can you see if xfs with barriers
for that kernel now works as expected?

diff --git a/kernel/sched.c b/kernel/sched.c
index e209901..6a54e4d 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3434,6 +3434,8 @@ asmlinkage void __sched schedule(void)
 			print_irqtrace_events(current);
 		dump_stack();
 	}
+	if (unlikely(current->io_context && current->io_context->plugged))
+		blk_replug_current_nested();
 	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
 
 need_resched:

-- 
Jens Axboe

-

From: Christoph Lameter
Date: Monday, February 5, 2007 - 11:20 am

The kernel that failed before boots fine with this patch.

-

From: Jens Axboe
Date: Monday, February 5, 2007 - 11:34 am

Wonderful, I'll leave the patch in-place for now.

-- 
Jens Axboe

-

From: David Chinner
Date: Thursday, February 1, 2007 - 9:08 pm

Jens, this patch looks like you originally removed the explicit
unplug calls that XFS used to prevent metadata I/O hangs and now you
are putting them back.  Correct?

Reading on from Andrew's earlier comments, shouldn't XFS have
worked unchanged? I'm just trying to understand why you removed
the explicit unplugs in the first place.....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-

From: Jens Axboe
Date: Friday, February 2, 2007 - 12:31 am

It should, the problem is if someone has plugged higher up in the
hierarchy, then you do need the explicit replug to drain that before
going to sleep and waiting for IO to complete. Not very happy about that
situation, I'd prefer if that happened automagically. I'll likely change
the code to fix that, so we don't have to sprinkle
blk_replug_current_nested() and always call io_schedule() instead of
schedule(). It's just going to cause too many problems.

-- 
Jens Axboe

-

From: Cedric Le Goater
Date: Thursday, February 1, 2007 - 11:24 am

Hello !

changes in git-acpi.patch in 2.6.20-rc6-mm3 (and maybe before) broke the Summit 
sub-arch (IBM x440) compile :( 

thanks, 
 
C. 

  CC      arch/i386/kernel/cpu/intel.o
  CC      arch/i386/kernel/early_printk.o
arch/i386/kernel/srat.c: In function 'parse_cpu_affinity_structure':
arch/i386/kernel/srat.c:68: error: dereferencing pointer to incomplete type
arch/i386/kernel/srat.c:72: error: dereferencing pointer to incomplete type
arch/i386/kernel/srat.c:72: error: dereferencing pointer to incomplete type
arch/i386/kernel/srat.c:74: error: dereferencing pointer to incomplete type
arch/i386/kernel/srat.c:74: error: dereferencing pointer to incomplete type
arch/i386/kernel/srat.c:77: error: dereferencing pointer to incomplete type
arch/i386/kernel/srat.c:77: error: dereferencing pointer to incomplete type
arch/i386/kernel/srat.c: In function 'parse_memory_affinity_structure':
arch/i386/kernel/srat.c:93: error: dereferencing pointer to incomplete type
arch/i386/kernel/srat.c:97: error: dereferencing pointer to incomplete type
arch/i386/kernel/srat.c:97: error: dereferencing pointer to incomplete type
arch/i386/kernel/srat.c:100: error: dereferencing pointer to incomplete type
arch/i386/kernel/srat.c:101: error: dereferencing pointer to incomplete type
arch/i386/kernel/srat.c:102: error: dereferencing pointer to incomplete type
arch/i386/kernel/srat.c:103: error: dereferencing pointer to incomplete type
arch/i386/kernel/srat.c:108: error: dereferencing pointer to incomplete type
arch/i386/kernel/srat.c:134: error: dereferencing pointer to incomplete type
arch/i386/kernel/srat.c:135: error: dereferencing pointer to incomplete type
arch/i386/kernel/srat.c:136: error: dereferencing pointer to incomplete type
arch/i386/kernel/srat.c: In function 'acpi20_parse_srat':
arch/i386/kernel/srat.c:188: error: 'ACPI_SRAT_PROCESSOR_AFFINITY' undeclared (first use in this function)
arch/i386/kernel/srat.c:188: error: (Each undeclared identifier is reported only ...
From: Starikovskiy, Alexey Y
Date: Thursday, February 1, 2007 - 12:37 pm

Sorry, here is the patch... ACPI has switched to acpi_find_rsdp(), so
srat.c might want to do that too, please check.

Thanks,
From: Cedric Le Goater
Date: Thursday, February 1, 2007 - 1:29 pm

got it. running a compile and boot test. 

I should have the results in 'my' morning (UTC+1).

Thanks !

C.
-

From: Cedric Le Goater
Date: Thursday, February 1, 2007 - 1:38 pm

hmm, i got another issue while compiling :

  CHK     include/linux/compile.h
  UPD     include/linux/compile.h
  CC      init/version.o
  LD      init/built-in.o
  LD      .tmp_vmlinux1
arch/i386/kernel/built-in.o: In function `get_memcfg_from_srat':
/home/legoater/linux/2.6.20-rc6-mm3/arch/i386/kernel/srat.c:279: undefined reference to `acpi_find_root_pointer'


I'll catchup in the morning.

thanks,

C.
-

From: Starikovskiy, Alexey Y
Date: Friday, February 2, 2007 - 7:22 am

Hi,
I updated patch to use acpi_find_rsdp(), as all other code does.
Could you please try it?

Thanks,
From: Cedric Le Goater
Date: Friday, February 2, 2007 - 7:47 am

Hello !


so it probably means that drivers/acpi/tables/tbxfroot.c is 

sure, I'll cancel the current boot test in which I was using 
acpi_find_root_pointer() in tbxfroot.c and restart one with your


Thanks !

C.
-

From: Starikovskiy, Alexey Y
Date: Friday, February 2, 2007 - 7:50 am

How long does it take to boot this thing?
Regards,
	Alex.
-

From: Cedric Le Goater
Date: Friday, February 2, 2007 - 9:04 am

well, not that long, but i don't have access directly to this 
machine, only through a test batch manager ... 

C.
-

From: Cedric Le Goater
Date: Saturday, February 3, 2007 - 12:30 am

dmesg looks fine. However, there is a :

ACPI Warning (tbfadt-0415): Optional field "Gpe1Block" has zero address or length: 0000000000000000/4 [20070126]

but I don't know how to interpret this ? Any Idea ?

thanks,


C.


Linux version 2.6.20-rc6-mm3-lxc2-autokern1 (root@fpos1) (gcc version 4.0.3 (Ubuntu 4.0.3-1ubuntu5)) #1 SMP Fri Feb 2 20:38:46 UTC 2007
BIOS-provided physical RAM map:
sanitize start
sanitize end
copy_e820_map() start: 0000000000000000 size: 000000000009dc00 end: 000000000009dc00 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 000000000009dc00 size: 0000000000002400 end: 00000000000a0000 type: 2
copy_e820_map() start: 00000000000e0000 size: 0000000000020000 end: 0000000000100000 type: 2
copy_e820_map() start: 0000000000100000 size: 00000000dfea25c0 end: 00000000dffa25c0 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 00000000dffa25c0 size: 0000000000009c80 end: 00000000dffac240 type: 3
copy_e820_map() start: 00000000dffac240 size: 0000000000053dc0 end: 00000000e0000000 type: 2
copy_e820_map() start: 00000000fec00000 size: 0000000001400000 end: 0000000100000000 type: 2
copy_e820_map() start: 0000000100000000 size: 0000000120000000 end: 0000000220000000 type: 1
copy_e820_map() type is E820_RAM
 BIOS-e820: 0000000000000000 - 000000000009dc00 (usable)
 BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000dffa25c0 (usable)
 BIOS-e820: 00000000dffa25c0 - 00000000dffac240 (ACPI data)
 BIOS-e820: 00000000dffac240 - 00000000e0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000220000000 (usable)
Node: 0, start_pfn: 0, end_pfn: 157
Node: 0, start_pfn: 256, end_pfn: 917410
Node: 0, start_pfn: 1048576, end_pfn: 2228224
get_memcfg_from_srat: assigning address to rsdp
RSD PTR  v0 [IBM   ]
Begin SRAT table scan....
CPU 0x00 in proximity domain 0x00
CPU 0x02 in ...
From: Starikovskiy, Alexey Y
Date: Saturday, February 3, 2007 - 12:57 am

This warning should probably be disabled, to not confuse users... Spec
says that some registers are optional, and ACPICA used to keep silence
then it encountered one, but now it produces this meaningless warning...
Ignore it...

Regards,
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
-

From: Daniel Walker
Date: Friday, February 2, 2007 - 10:39 am

I was running likely profiling and I noticed that when I turn on the
"tickless" option I get the following line, 

+unlikely | 21111208|  9367278  need_resched()@:include/linux/sched.h@1597

This means that this line is 21111208 true, and 9367278 false.

This existed on bootup, and stayed after about 7 hours of runtime
(mostly idle). Since need_resched is a "static inline" there are
multiple instances of need_resched() in the kernel. If I turn off the
"tickless" feature this wrong unlikely disappears, and the output looks
like this,

 unlikely |        0|      169  need_resched()@:include/linux/sched.h@1597
 unlikely |        0|       94  need_resched()@:include/linux/sched.h@1597
 unlikely |        0|       63  need_resched()@:include/linux/sched.h@1597
 unlikely |        0|       19  need_resched()@:include/linux/sched.h@1597
 unlikely |        0|      379  need_resched()@:include/linux/sched.h@1597
 unlikely |        1|   202596  need_resched()@:include/linux/sched.h@1597
 unlikely |        7|   205929  need_resched()@:include/linux/sched.h@1597
 unlikely |     6461|   271690  need_resched()@:include/linux/sched.h@1597

Only a little after boot. I suppose this could be a natural side effect
of the tickless feature but I thought I would report it anyway.

Daniel

-

From: Daniel Walker
Date: Tuesday, February 6, 2007 - 9:03 am

It appears there a problem with the /proc/interrupts entry for
"timer" .. It doesn't increment anymore .. This problem exists in the
-rt tree also .. I haven't done a bisect , but I'm assuming this is HRT
related ..

Also my NMI watchdog isn't functioning , which also exists in the -rt
tree, and -mm .. Also likely HRT related ..

I don't have HRT or dynamic tick turned on .. This started happening in
-mm2 , and it worked in -mm1 ..

Daniel

-

From: Thomas Gleixner
Date: Tuesday, February 6, 2007 - 11:36 am

And why should it increment ? Is there a rule that it has to ?

	tglx



-

From: Daniel Walker
Date: Tuesday, February 6, 2007 - 11:45 am

I don't know .. I would imagine some users might look at it and wonder
why there timer isn't ticking (I know it actually is ticking , but they
don't), when it has is every other kernel. 

We could just remove the timer entry .

Daniel

-

From: Thomas Gleixner
Date: Tuesday, February 6, 2007 - 12:07 pm

No we can't. The timer interrupt is setup and it does not go away, as we
keep the PIT as a backup for the broken lapics.

	tglx




-

From: Daniel Walker
Date: Tuesday, February 6, 2007 - 12:55 pm

I'm not trying to create anything .. However, as I said before
the /proc/interrupts "timer" entry doesn't work the same as it has in
other kernels.


Ok, how about adding the interrupts to the list which are driving the
timer ?

Daniel

-

From: Thomas Gleixner
Date: Tuesday, February 6, 2007 - 1:20 pm

Yes, it is different. Why are you insisting, that something is a problem

Simply because it works and it does not make any sense to have a per cpu
timer (lapic) and the PIT firing at the same periodic interval. PIT does
nothing else than jiffies64++. The clockevents code just optimizes that
away and lets one cpu do the jiffies64++ in its periodic per cpu
interrupt.


Uurg. /proc/interrupts has nothing to do with timers. It's interrupts
statistics. See LOC entry for the lapic ones.

	tglx


-

From: Daniel Walker
Date: Tuesday, February 6, 2007 - 1:40 pm

In this case "different" goes into userspace .. So different could mean
userspace regression, which is something that we don't want. I have no
idea if any apps use /proc/interrupts , but it's possible since it's
been around for a long time.

The reason that I'm bringing it up at all is because people have ask me

Your saying we can't remove it tho, if /proc/interrupts is not related
to timers why does the entry exist at all ? Your saying the LOC entry is
the new "timer" entry, but we still have the old "timer" entry ..

Getting confusing ..

It might be nicer to list all the registered clock event sources
in /proc/interrupts, with more descriptive names .. 

Why is it that HRT doesn't use the "timer" as a valid timer?

Daniel

-

From: Ingo Molnar
Date: Tuesday, February 6, 2007 - 1:52 pm

Well, if you enable dynticks you should expect the number of timer irqs 

it's quite easy to explain: because of the new dynticks feature. Both 

they are already listed in /proc/interrupts, depending on how they use 
interrupts. For a more complete list of in-use clockevent drivers see 
/proc/timer_info. But it would be wrong to touch /proc/interrupts to 
create some special-case for clockevents.

	Ingo
-

From: Daniel Walker
Date: Tuesday, February 6, 2007 - 1:56 pm

I don't have that enabled tho .. This is with HRT/dynamic tick both
off..


Daniel

-

From: Ingo Molnar
Date: Tuesday, February 6, 2007 - 2:09 pm

your kernel utilizes the kernelin a more optimal way: the new 
clockevents code now utilizes the local APIC timer irq (represented by 
the LOC field) for periodic interrupts. The local APIC timer irq has a 
cost of ~2 usecs per IRQ, while the PIT irq is ~10 usecs per irq. With 
HZ=1000 this means savings of ~8000 usecs per second - i.e. 8 msecs per 
second, which is 0.8% more raw CPU power available - which isnt that 
bad.

we could make this clearer by renaming 'LOC' (which stands for 'LOCal 
timer interupts' and was added [and misnamed] by yours truly many moons 
ago) to 'apic-timer' and 'timer' to 'PIT-timer' but /that/ would be more 
of a userspace visible change than the change in the counter rates.

	Ingo
-

From: Ingo Molnar
Date: Tuesday, February 6, 2007 - 2:20 pm

Ingo
-

From: Daniel Walker
Date: Tuesday, February 6, 2007 - 2:23 pm

If we change the current "timer" entry to be listed as "lapic-timer" and
not "IO-APIC-edge" (or one of the other names) and replace it with the
count from LOC , that would make sense cause that field already changes
depending if you have a io-apic or not ..

I think the regression (if you can call it that) is not scripts
crashing, but more people not know what's going on with there system .. 

Daniel

-

From: Ingo Molnar
Date: Tuesday, February 6, 2007 - 2:41 pm

doing that would not fake the old behavior (which is your suggestion), 
LOC is per CPU, while the PIT timer irq that was there is global.

But, as per the previous mails, the new behavior is just fine, because 
/proc/interrupts just reflects reality. And the way the kernel utilizes 
the hardware has just changed - for the better.

The same happens when say a network driver implements NAPI: the IRQ 
count goes way, way down. Or if a driver starts supporing MSI - the IRQ 
line even moves to another one. Do we try to fix those counts up to 
match the 'previous behavior'? Of course not. What you are suggesting 
makes no sense, is against current kernel practices - as we pointed it 

(that is something else: it's different because a different irq-chip is 
behind it.)

	Ingo
-

From: Daniel Walker
Date: Tuesday, February 6, 2007 - 2:54 pm

I'm not saying we should "fake" anything .. I'm saying list what's
really happening .. In a human readable way .

Your saying we should keep it unreadable, and let the users be that much

Why is that not the case with lapic ? 

Daniel

-

From: Ingo Molnar
Date: Tuesday, February 6, 2007 - 3:08 pm

"replace the timer entry with lapic-timer and put the LOC count there" 
is faking something that does not reflect reality. The 'timer' count is 

we list precisely what is happening: the number of IRQ#0 interrupts and 
the number of local APIC timer interrupts. Precisely where their 
traditional place is.

i think you might be confused by the generic name that says 'timer'. You 
should notice the other bits that are there too:

           CPU0       CPU1
  0:        495          0   IO-APIC-edge      timer

the '0' means IRQ#0. That makes it clear that this is the PIT timer. 
Clearer now?

	Ingo
-

From: Daniel Walker
Date: Tuesday, February 6, 2007 - 3:42 pm

I'm not trying to suggest we "fake" anything. Your just misunderstanding
me.. I'm am suggesting we change LOC to something readable. If you think
we're "faking" something by dropping the current "timer" request_irq()
then we certainly don't need to do that ..

The io-apic timer could potentially be a clock event device, that is
it's function isn't it ? It generates interrupts (note I said
interrupts) periodically .. The NMI is another example of that,
generates non-maskable interrupts based off a clock.. All are clock
based interrupt generating devices .. All could be clock event sources,
with all the other clock event sources in the system.

It makes sense (to me at least) that we should list all those interrupt
generating devices in /proc/interrupts with statistics of their usage.. 

I'm making suggestion here, you can call it "fake"'ing something or

Empirically, I know that users do not/will not understand what's
happened. so take that how ever you want, but _I_ think we should do
something so people better understand what has happened.

Daniel

-

From: Ingo Molnar
Date: Tuesday, February 6, 2007 - 3:56 pm

as i pointed it out in the previous mail, the problem is that what you 

changing the current 'timer' entry (which is line 2 of /proc/interrupts) 
to be 'listed as lapic-timer' and to 'replace it with the count from 
LOC' is faking a count in a line where nothing like that should be.

the kernel simply displays reality: IRQ#0 isnt increasing because it's 
not used, and LOC (local apic timers) is increasing.

	Ingo
-

From: Daniel Walker
Date: Tuesday, February 6, 2007 - 4:04 pm

What about the statistics for the other interrupts in the system ? It
clearly doesn't list all interrupts in the system .

Daniel

-

From: Ingo Molnar
Date: Tuesday, February 6, 2007 - 4:14 pm

it is very much relevant: faking a count is something we /dont/ want to 
do with /proc/interrupts, for (very) basic compatibility, simplicity and 
policy reasons. And that is precisely what your suggestion was to 

what is your point?

	Ingo
-

From: Daniel Walker
Date: Tuesday, February 6, 2007 - 4:22 pm

As I said you are misunderstanding me .. which is why this is not
relevant any more .. 


Isn't the listing inconsistent ? /proc/interrupts only showing some
special interrupts, and not others .. For example it shows NMI which is
not related to request_irq() .. It shows some clock driver devices
(timer, NMI, LOC) and not others (clock event devices) ..

Daniel

-

From: Ingo Molnar
Date: Tuesday, February 6, 2007 - 4:28 pm

actually, i quoted what you said:

| If we change the current "timer" entry to be listed as "lapic-timer" 
| and not "IO-APIC-edge" (or one of the other names) and replace it with 
| the count from LOC

this is a pretty clear sentence, i dont think i misunderstood anything 
about it. If i did, please point it out specifically.

	Ingo
-

From: Daniel Walker
Date: Tuesday, February 6, 2007 - 4:35 pm

Geez , man I've corrected this statement already .. Why don't you quote
the corrections. Your not listening cause your ignoring everything I
said after this, and accepting only my first statement and rejecting
everything else.. Like you want this to descend into a melee .

Last and final correction. I'm saying drop the timer entry, which means
drop the call to request_irq() for irq0 . Add lines for lapic-timer
which take the place of LOC..

Daniel

-

From: Thomas Gleixner
Date: Tuesday, February 6, 2007 - 4:44 pm

Right, that's a real good suggestion. Here's the patch especially for
you. Apply it and figure out yourself, why your computer won't boot
anymore.

	tglx


Index: linux-2.6.20/arch/i386/mach-default/setup.c
===================================================================
--- linux-2.6.20.orig/arch/i386/mach-default/setup.c
+++ linux-2.6.20/arch/i386/mach-default/setup.c
@@ -95,8 +95,10 @@ static struct irqaction irq0  = {
  **/
 void __init time_init_hook(void)
 {
+#ifdef CONFIG_THIS_IS_NOT_DWALKERS_COMPUTER
 	irq0.mask = cpumask_of_cpu(0);
 	setup_irq(0, &irq0);
+#endif
 }
 
 #ifdef CONFIG_MCA


-

From: Ingo Molnar
Date: Tuesday, February 6, 2007 - 4:51 pm

i'm sorry, but where did you "correct this statement already"? You 
havent replied to your mail to correct it explicitly, and there's no 
later statement of yours that says anything near to "let me correct this 
via X" or "i was wrong here, i meant Y".

the only subsequent reference of yours seems to be:

| I'm not saying we should "fake" anything .. I'm saying list what's 
| really happening .. In a human readable way .

what you write here does not read as a 'correction', this disputes my 
characterisation, suggesting that your original point is still intact. 
How should i have known that you meant this to be a 'correction' of your 
original point, and that this (whatever it means precisely) replaces it?

if you concede a point or correct a statement then /please/ make it 
clear. There's nothing bad about being wrong or being stupid 

it's not a request_irq() but a setup_irq().

dropping the IRQ#0 line would be fatally wrong: /proc/interrupt lists 
all active interrupt lines. There can (and often is) a count in IRQ#0. 
Why should it be hidden?

furthermore, as i pointed it out earlier: what you suggest is bad for 
compatibility: removing/changing the non-count portions of the LOC or 
the IRQ#0 entry /will/ break scripts.

	Ingo
-

From: Daniel Walker
Date: Tuesday, February 6, 2007 - 7:46 pm

I guess I will respond ....


I know that you see corrections as responses to your own email, but it's
not universal .. Everyone has their own methodology, AFAIK it's free

I don't take a literal approach to email, which you seem to be taking ..
I think your seeing this thread as an argument for or against something
and you have taken a position which you diligently stick to..

My position is not fixed. However, your arguing as if my position was
fixed. My perspective of this thread was not to argue for a specific
change, but to throw out changes and see if anything stuck ..

Where my statements were suppose to be loose to begin with, so loose as
to only spark the start of an idea, not to promote something specific.
If you read the start of the thread you'll notice that I gave Thomas two
totally opposite ideas.

"We could just remove the timer entry."
or
"[..]how about adding the interrupts to the list which are driving the
timer ?"

When I started the thread I had a similar position as Thomas, but I was
concerned that I was missing something or the code was missing
something.. This was the reason for starting the thread ..

So I'll gladly concede all points. To me it wasn't about the argument,
or even my own ideas ..

Daniel

-

From: Thomas Gleixner
Date: Tuesday, February 6, 2007 - 4:36 pm

It shows _ALL_ used interrupts in the system. There is no point to let

PIT is a clock event device and uses IRQ0, where the interrupt count is
displayed:

0:    3022812          0   IO-APIC-edge      timer

Local APIC timer is a clock event device too and the interrupt count is
displayed as well:

LOC:     177795    1755941 

There are no other clock event devices in a PC system at the moment
and /proc/interrupt does not care, whether the interrupt was setup for a
clock event device or something else. It displays the name which is
given in the irqaction struct and does not care what it means. I did not
change the name in the IRQ#0 setup, so it still displays "timer" (which
can either be PIT or HPET), but this is something the interrupt layer
does not know and does not care about.

The special interrupts, which are not handled by the generic IRQ layer
(LOC, NMI) are displayed to have the complete statistics available.

We did not change anything on that. The changed behavior you are
observing (IRQ#0 is not incrementing) is reflecting the reality of the
system. IRQ#0 is not firing, so it does not increment.

	tglx


-

From: Daniel Walker
Date: Tuesday, February 6, 2007 - 6:12 pm

So your saying the "timer" entry in /proc/interrupts can be either the
HPET timer, the PIT timer? Mine says "IO-APIC-edge" which does that map
to? It's going though the io-apic but it's still the pit ?

Daniel

-

From: Thomas Gleixner
Date: Wednesday, February 7, 2007 - 7:53 am

IO-APIC: Input/Output Advanced Programmable Interrupt Controller. This
device does not generate interrupts by itself. Devices, which generate
interrupts are connected to it.

23:         82          0   IO-APIC-fasteoi   ohci1394, HDA Intel

This is IRQ#23 coming in via IO-APIC (fasteoi type). The interrupt is
shared by two devices, which identified themself as "ohci1394" and "HDA
Intel" via request_irq(). The interrupt originates from one of those
devices. So it _IS_ going through the IO-APIC, but generated either by
the Firewire device or the Audio device.

  0:     186222          0   IO-APIC-edge      timer

This is IRQ#0 coming in via IO-APIC (edge type). The interrupt is not
shared. The device identified itself as "timer" via setup_irq(). The
interrupt originates from this device. The interrupt is either caused by
PIT or HPET via a hardware switch mechanism, which is activated when you
use HPET. There is no way to share IRQ#0 here. It's either or as defined
by hardware magic.

	tglx



-

From: Ingo Molnar
Date: Tuesday, February 6, 2007 - 4:37 pm

it's not inconsistent. /proc/interrupts lists registered interrupts plus 
some special hardcoded platform interrupts that are not explicitly 
registered - with the goal of providing a list of all active interrupt 
sources. /proc/interrupts has been doing that for more than 10 years. 
Clock event devices themselves are not 'interrupt lines', why should 
they be listed in /proc/interrupts?

	Ingo
-

From: Thomas Gleixner
Date: Tuesday, February 6, 2007 - 3:13 pm

We do that. IRQ0 is not happening. So simply it does not increment the

It is readable, as it reflects the reality which is going on in the
system and not some artificial view which you think is how the interrupt
count should be presented. /proc/interrupt _IS_ statistics about the

Local APIC is not really part of the interrupt subsystem as it uses a
seperate entry vector for historic reasons and therefor is not handled
by setup/request/free/... _irq() functions.

	tglx


-

From: Thomas Gleixner
Date: Tuesday, February 6, 2007 - 2:43 pm

No. We are not fiddling with the IRQ subsystem statistics. IRQ subsystem
is unrelated to timers. And we do switch away from PIT if we have an
local apic timer, so the output of /proc/interrupt is just a mirror of
the real system and not some made up thing, which will make it harder to

I did not hear a complaint of anyone except you. I doubt that there will
be a big confusion as long as the kernel does work as expected.

	tglx


-

From: Daniel Walker
Date: Tuesday, February 6, 2007 - 2:59 pm

This is going to be a slow motion explosion ..

Daniel

-

From: Thomas Gleixner
Date: Tuesday, February 6, 2007 - 2:17 pm

It _IS_ statistics info about the number of interrupts and has no fixed
meaning at all. It does not cause any user space regression, as the
interface is still the same. It produces different numbers, like the
clock_getres() syscall returns different values on highres and !highres

So it's a problem of user perception and not of a user space regression.

Ok. Each irqaction struct which is used to request/setup an interrupt
contains a name field. This is the one which shows up
in /proc/interrupts. The one which is used to setup irq0 has .name =


No. No. No. clockevents has nothing to do with /proc/interrupts.


Because local apic timer is better.

	tglx


-

From: Daniel Walker
Date: Tuesday, February 6, 2007 - 2:25 pm

At least we agree on this point .. I'm not trying to confuse anything,
this issue needs to be discussed ..

Daniel

-

From: Rob Landley
Date: Tuesday, February 6, 2007 - 4:15 pm

Because there are two clock sources in the machine and it's using the other 
one, so the interrupt isn't firing?

Are you saying that the /proc statistics aren't accurate, or that you 
previously misunderstood what it was actually measuring and you'd now like it 

I didn't think Thomas even touched the /proc/interrupts reporting code.  It's 
still accurate.  The patch changed the usage of timers, /proc/interrupts is 
accurately showing the change, and you're surprised that what it was 
measuring wasn't what you thought it was measuring all along.

This ain't jiffies.  This is how often the PIT fired.  They are not the same 
thing.

Rob
-- 
"Perfection is reached, not when there is no longer anything to add, but
when there is no longer anything to take away." - Antoine de Saint-Exupery
-

From: Daniel Walker
Date: Tuesday, February 6, 2007 - 4:28 pm

I understand exactly what is happening . The statistics are unclear, and
tend to confuse people .

Daniel

-

From: Rob Landley
Date: Tuesday, February 6, 2007 - 4:55 pm

Ah, you can't answer this question.  Right:

A) Because there are multiple timer interrupt sources in the system, and we're 
now using a newer (better) one.

B) This measures interrupts.  It doesn't measure jiffies.  Interrupts != 
jiffies.  This is a conceptual issue.



Rob
-- 
"Perfection is reached, not when there is no longer anything to add, but
when there is no longer anything to take away." - Antoine de Saint-Exupery
-

From: Adrian Bunk
Date: Tuesday, February 6, 2007 - 3:11 pm

acpi_os_readable() is no longer used.

Signed-off-by: Adrian Bunk <bunk@stusta.de>

---

 drivers/acpi/osl.c      |    2 --
 include/acpi/acpiosxf.h |    3 +--
 2 files changed, 1 insertion(+), 4 deletions(-)

--- linux-2.6.20-rc6-mm3/include/acpi/acpiosxf.h.old	2007-02-06 06:57:15.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/acpi/acpiosxf.h	2007-02-06 06:57:53.000000000 +0100
@@ -240,9 +240,8 @@
 acpi_os_validate_address(u8 space_id,
 			 acpi_physical_address address, acpi_size length);
 
-u8 acpi_os_readable(void *pointer, acpi_size length);
-
 #ifdef ACPI_FUTURE_USAGE
+u8 acpi_os_readable(void *pointer, acpi_size length);
 u8 acpi_os_writable(void *pointer, acpi_size length);
 #endif
 
--- linux-2.6.20-rc6-mm3/drivers/acpi/osl.c.old	2007-02-06 07:18:33.000000000 +0100
+++ linux-2.6.20-rc6-mm3/drivers/acpi/osl.c	2007-02-06 07:18:54.000000000 +0100
@@ -888,7 +888,6 @@
 
 	return 0;
 }
-#endif				/*  ACPI_FUTURE_USAGE  */
 
 /* Assumes no unreadable holes inbetween */
 u8 acpi_os_readable(void *ptr, acpi_size len)
@@ -901,7 +900,6 @@
 	return 1;
 }
 
-#ifdef ACPI_FUTURE_USAGE
 u8 acpi_os_writable(void *ptr, acpi_size len)
 {
 	/* could do dummy write (racy) or a kernel page table lookup.

-

From: Adrian Bunk
Date: Tuesday, February 6, 2007 - 3:12 pm

This patch contains the following possible cleanups:
- move extern declarations to atl1.h
- make needlessly global code static

Signed-off-by: Adrian Bunk <bunk@stusta.de>

---

BTW: Can we get a MAINTAINERS entry for this driver?

 drivers/net/atl1/atl1.h         |    6 ++++--
 drivers/net/atl1/atl1_ethtool.c |    3 ---
 drivers/net/atl1/atl1_hw.c      |    6 ++----
 drivers/net/atl1/atl1_main.c    |    8 +++-----
 drivers/net/atl1/atl1_param.c   |    4 +---
 5 files changed, 10 insertions(+), 17 deletions(-)

--- linux-2.6.20-rc6-mm3/drivers/net/atl1/atl1.h.old	2007-02-06 07:55:58.000000000 +0100
+++ linux-2.6.20-rc6-mm3/drivers/net/atl1/atl1.h	2007-02-06 08:19:50.000000000 +0100
@@ -34,8 +34,10 @@
 s32 atl1_up(struct atl1_adapter *adapter);
 void atl1_down(struct atl1_adapter *adapter);
 int atl1_reset(struct atl1_adapter *adapter);
-s32 atl1_setup_ring_resources(struct atl1_adapter *adapter);
-void atl1_free_ring_resources(struct atl1_adapter *adapter);
+
+extern char atl1_driver_name[];
+extern char atl1_driver_version[];
+extern const struct ethtool_ops atl1_ethtool_ops;
 
 struct atl1_adapter;
 
--- linux-2.6.20-rc6-mm3/drivers/net/atl1/atl1_hw.c.old	2007-02-06 07:52:20.000000000 +0100
+++ linux-2.6.20-rc6-mm3/drivers/net/atl1/atl1_hw.c	2007-02-06 07:56:22.000000000 +0100
@@ -31,8 +31,6 @@
 #include "atl1.h"
 
 
-extern char atl1_driver_name[];
-
 /**
  * Reset the transmit and receive units; mask and clear all interrupts.
  * hw - Struct containing variables accessed by shared code
@@ -209,7 +207,7 @@
  * get_permanent_address
  * return 0 if get valid mac address, 
  **/
-int atl1_get_permanent_address(struct atl1_hw *hw)
+static int atl1_get_permanent_address(struct atl1_hw *hw)
 {
 	u32 addr[2];
 	u32 i, control;
@@ -602,7 +600,7 @@
 	return ret_val;
 }
 
-struct atl1_spi_flash_dev flash_table[] = {
+static struct atl1_spi_flash_dev flash_table[] = {
 /*	MFR_NAME  WRSR  READ  PRGM  WREN  WRDI  RDSR  RDID  SECTOR_ERASE CHIP_ERASE */
 ...
From: Jay Cliburn
Date: Tuesday, February 6, 2007 - 5:19 pm

On Tue, 6 Feb 2007 23:12:29 +0100

Adrian,

The atl1 driver currently follows this development pathway:

developer -> netdev#atl1 -> netdev#ALL -> -mm

Your patch is just a little bit out ahead of us.  Some of your suggested
changes are already in the pipeline; we're just waiting for Jeff to

Already submitted to netdev#atl1.


netdev#atl1 already has this change.

The rest of these I'll bundle up and submit to netdev#atl1, too.  Will
-

From: Jeff Garzik
Date: Tuesday, February 6, 2007 - 5:22 pm

Technical note:  merging #atl1 into #ALL happens each time 
netdev-2.6.git is flushed out from my local machine.

	Jeff


-

From: J. K. Cliburn
Date: Tuesday, February 6, 2007 - 5:24 pm

Noted.  Thanks.
-

From: Adrian Bunk
Date: Tuesday, February 6, 2007 - 5:24 pm

Do what you consider the right thing - I don't care how it gets into the 
various trees.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Adrian Bunk
Date: Tuesday, February 6, 2007 - 3:12 pm

This patch contains the following cleanups:
- proper prototypes for global code in aacraid.h
- aac_rx_start_adapter() can now become static

Signed-off-by: Adrian Bunk <bunk@stusta.de>

---

 drivers/scsi/aacraid/aacraid.h |    3 +++
 drivers/scsi/aacraid/linit.c   |    2 --
 drivers/scsi/aacraid/nark.c    |    3 ---
 drivers/scsi/aacraid/rkt.c     |    3 ---
 drivers/scsi/aacraid/rx.c      |    2 +-
 5 files changed, 4 insertions(+), 9 deletions(-)

--- linux-2.6.20-rc6-mm3/drivers/scsi/aacraid/aacraid.h.old	2007-02-06 08:22:50.000000000 +0100
+++ linux-2.6.20-rc6-mm3/drivers/scsi/aacraid/aacraid.h	2007-02-06 08:27:17.000000000 +0100
@@ -1840,8 +1840,11 @@
 int aac_get_adapter_info(struct aac_dev* dev);
 int aac_send_shutdown(struct aac_dev *dev);
 int aac_probe_container(struct aac_dev *dev, int cid);
+int _aac_rx_init(struct aac_dev *dev);
+int aac_rx_select_comm(struct aac_dev *dev, int comm);
 extern int numacb;
 extern int acbsize;
 extern char aac_driver_version[];
 extern int startup_timeout;
 extern int aif_timeout;
+extern int expose_physicals;
--- linux-2.6.20-rc6-mm3/drivers/scsi/aacraid/rx.c.old	2007-02-06 08:21:40.000000000 +0100
+++ linux-2.6.20-rc6-mm3/drivers/scsi/aacraid/rx.c	2007-02-06 08:21:50.000000000 +0100
@@ -294,7 +294,7 @@
  *	Start up processing on an i960 based AAC adapter
  */
 
-void aac_rx_start_adapter(struct aac_dev *dev)
+static void aac_rx_start_adapter(struct aac_dev *dev)
 {
 	struct aac_init *init;
 
--- linux-2.6.20-rc6-mm3/drivers/scsi/aacraid/linit.c.old	2007-02-06 08:23:20.000000000 +0100
+++ linux-2.6.20-rc6-mm3/drivers/scsi/aacraid/linit.c	2007-02-06 08:23:26.000000000 +0100
@@ -82,8 +82,6 @@
 static int aac_cfg_major = -1;
 char aac_driver_version[] = AAC_DRIVER_FULL_VERSION;
 
-extern int expose_physicals;
-
 /*
  * Because of the way Linux names scsi devices, the order in this table has
  * become important.  Check for on-board Raid first, add-in cards second.
--- ...
From: Adrian Bunk
Date: Tuesday, February 6, 2007 - 3:12 pm

This patch makes the needlessly global gfs2_writepages() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>

--- linux-2.6.20-rc6-mm3/fs/gfs2/ops_address.c.old	2007-02-06 08:30:19.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/gfs2/ops_address.c	2007-02-06 08:30:32.000000000 +0100
@@ -170,7 +170,8 @@
  * and write whole extents at once. This is a big reduction in the
  * number of I/O requests we send and the bmap calls we make in this case.
  */
-int gfs2_writepages(struct address_space *mapping, struct writeback_control *wbc)
+static int gfs2_writepages(struct address_space *mapping,
+			   struct writeback_control *wbc)
 {
 	struct inode *inode = mapping->host;
 	struct gfs2_inode *ip = GFS2_I(inode);

-

From: Steven Whitehouse
Date: Wednesday, February 7, 2007 - 3:50 am

Hi,

Now applied to the GFS2 -nmw git tree. Thanks,

Steve.


-

Previous thread: [PATCH -mm] jmicron: 40/80pin primary detection by ethanhsiao on Monday, January 29, 2007 - 9:03 pm. (2 messages)

Next thread: Re: [Ksummit-2007-discuss] Re: [Ksummit-2006-discuss] 2007 Linux Kernel Summit by Jes Sorensen on Monday, January 29, 2007 - 9:51 pm. (9 messages)