Re: 2.6.21-rc2 regression vs. 2.6.20: AT keyboard only works with pci=noacpi

Previous thread: [2.6.20.y PATCH 4/7] DVB: cxusb: fix firmware patch for big endian systems by Michael Krufky on Saturday, March 3, 2007 - 7:36 am. (1 message)

Next thread: [git patch] libata build fix by Jeff Garzik on Saturday, March 3, 2007 - 8:39 am. (1 message)
From: Ash Milsted
Date: Saturday, March 3, 2007 - 8:14 am

Hi,
With 2.6.21-rc2-git1 I have a problem with my ps/2 port keyboard - it only works
with one of the following on the command-line:
 - nolapic
 - irqfixup
 - pci=noacpi
Otherwise it gets stuck with the numlock on.
The following options have no effect:
 - nohz=off (who knows eh?)
 - pci=nomsi

Here is my lspci:
00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge (rev 80)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
00:0a.0 Network controller: RaLink RT2500 802.11g Cardbus/mini-PCI (rev 01)
00:0b.0 Multimedia audio controller: Creative Labs SB Audigy (rev 04)
00:0b.1 Input device controller: Creative Labs SB Audigy Game Port (rev 04)
00:0b.2 FireWire (IEEE 1394): Creative Labs SB Audigy FireWire Port (rev 04)
00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80)
00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South]
00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60)
00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 78)
01:00.0 VGA compatible controller: nVidia Corporation NV11 [GeForce2 MX/MX 400] (rev a1)

And here is a boot log without any of those parameters (keyboard fails):
Mar  3 14:38:24 joker syslog-ng[4183]: syslog-ng starting up; version='2.0.0' 
Mar  3 14:38:24 joker a0000 (reserved)
Mar  3 14:38:24 joker limit_regions endfor: 00000000000f0000 - ...
From: Ash Milsted
Date: Sunday, March 4, 2007 - 7:23 am

On Sat, 3 Mar 2007 15:14:24 +0000
> Mar  3 14:43:13 joker pnp: Device 00:
From: Ash Milsted
Date: Wednesday, March 7, 2007 - 5:00 am

On Sun, 4 Mar 2007 14:23:50 +0000

Any thoughts on this? It still occurs with 2.6.21-rc3. Here's my config
in case that helps. You'll see that I have swap-prefetch patched in (I
also have RDSL and some VM changes in there), but I have confirmed that
the problem occurs with no extra patches. By the way, I tested mm1 with
a rather different config (I used my distro package) and still saw the
problem.

Also, you should probably ignore that bit above where I suggest the
keyboard driver is being loaded as a module, because of course it
isn't.. Yet it does start responding (with pci=noacpi) at about that
time that udev does its thing.

Anyhoo:
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.21-beyond
# Tue Mar  6 15:07:17 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION="ash"
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SWAP_PREFETCH=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_UTS_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# ...
From: Dmitry Torokhov
Date: Wednesday, March 7, 2007 - 7:22 am

Your config looks fine so it must be some ACPI change that affected
IRQ routing. If IRQ's not being delievered AT keyboard probe will time
out. You said that it broke between 2.6.20 and 2.6.20-rc2.. Have you
tried -rc1?

Tha happens because actual keyboard/mouse probing is offloaded to
kseriod thread so nothing happens untul it actually gets scheduled.

-- 
Dmitry
-

From: Ash Milsted
Date: Wednesday, March 7, 2007 - 2:25 pm

On Wed, 7 Mar 2007 12:00:04 +0000
schnip

So, I tracked this down to 2.6.21-git7, the first snapshot that gives me
this problem. Tellingly it does contain an input tree merge. I would git bisect
but I don't have a local copy of the tree - I tried to get one, but it stopped
halfway through the clone, probably because I had to use http... So, I hope that
helps.

Ash

PS: I should have said I'm not subscribed, so please CC me on reply.
PPS: That almost rhymes. Almost.
-

From: Linus Torvalds
Date: Wednesday, March 7, 2007 - 2:47 pm

Hmm. There is no "2.6.21-git7" (that would be the seventh nightly snapshot 
after 2.6.21 is released, which hasn't happened yet!).

Do you mean that it happens between 2.6.20-git6 and 2.6.20-git7? That 
would be git commits (the way to get them is to look at the "*.id" file 
that is associated with a snapshot):

	66efc5a7e3061c3597ac43a8bb1026488d57e66b -git6
	509cb37e173d4e39cec47238397e91b718730794 -git7

and yes, doing a

	gitk 66efc5a7..509cb37e


Can you try "rsync"? It's not a great protocol in general, but it's 
perfectly fine for an initial clone..

After that, since you have already narrowed it down to a particular 
nightly snapshot, you could do the bisection startign from the known 
commits already:

	git bisect start
	git bisect good 66efc5a7	# 2.6.20-git6 was good
	git bisect bad 509cb37e		# 2.6.20-git7 was bad

and you'll have less than 500 commits to test (which is quite fast to 
bisect).

If you want to do some manual checking first (ie guessing that the 
bad behaviour came from that particular input merge), you could first try 
out commit 2a598df5, which is the head commit before of the merged input 
tree (this is all trivial to see with the above "gitk" - the SHA1's may 
sound scary and esoteric, but they're really easy to look up).

That manual check (*if* it turns out that 2a598df5 is indeed the bad one) 
would cut down the range from good to bad to just 18 commits (the range 
would be 66efc5a7..2a598df5), and then you should be able to pinpoint the 
exact bad one from just a few reboots..

(If you have to bisect all 500 commits, it would be ~10 reboots rather 
than four or five).

		Linus
-

From: Dmitry Torokhov
Date: Wednesday, March 7, 2007 - 2:50 pm

Hm, that is strange... 2.6.20-rc7 has i8042 AUX IRQ delivery test fix
and fix for panic blink, both shoudl not really affect your keyboard.
Can I please get full dmesg of boot with "i8042.debug
log_buf_len=131072"?

-- 
Dmitry
-

From: Dmitry Torokhov
Date: Wednesday, March 7, 2007 - 9:14 pm

Argh, I can't believe I forgot to get this into my tree. Could you please
tell me if the patch below fixes ytour issue?

-- 
Dmitry

Input: i8042 - another attempt to fix AUX delivery checks

Do not assume that AUX_LOOP command is broken unless it
completes successfully but returns wrong (unexpected) data.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
---
 drivers/input/serio/i8042.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Index: linux/drivers/input/serio/i8042.c
===================================================================
--- linux.orig/drivers/input/serio/i8042.c
+++ linux/drivers/input/serio/i8042.c
@@ -553,7 +553,8 @@ static int __devinit i8042_check_aux(voi
  */
 
 	param = 0x5a;
-	if (i8042_command(&param, I8042_CMD_AUX_LOOP) || param != 0x5a) {
+	retval = i8042_command(&param, I8042_CMD_AUX_LOOP);
+	if (retval || param != 0x5a) {
 
 /*
  * External connection test - filters out AT-soldered PS/2 i8042's
@@ -567,7 +568,12 @@ static int __devinit i8042_check_aux(voi
 		    (param && param != 0xfa && param != 0xff))
 			return -1;
 
-		aux_loop_broken = 1;
+/*
+ * If AUX_LOOP completed without error but returned unexpected data
+ * mark it as broken
+ */
+		if (!retval)
+			aux_loop_broken = 1;
 	}
 
 /*
-

From: Ash Milsted
Date: Wednesday, March 7, 2007 - 4:49 pm

On Wed, 7 Mar 2007 21:25:34 +0000

Apologies for only replying to my own mails, but I need to be CC'd if
any alternative is to be convenient :)

Linus, thanks for your detailed messages. I will try to get a bisect done, but 
the university firewall is likely to put up a fight against rsync as much as it
does with the git protocol. We will see. And yeah, that 2.6.21-git7 business was
a typo, should've been 2.6.20-git7, natch.

Anyway, here's the bootlog for Dmitry from a boot with broken keyboard (2.6.21-rc3):
Mar  7 23:16:41 joker syslog-ng[4349]: syslog-ng starting up; version='2.0.0' 
Mar  7 23:16:41 joker Linux version 2.6.21-beyondash (root@joker) (gcc version 4.1.2) #1 Wed Mar 7 11:39:45 GMT 2007
Mar  7 23:16:41 joker BIOS-provided physical RAM map:
Mar  7 23:16:41 joker sanitize start
Mar  7 23:16:41 joker sanitize end
Mar  7 23:16:41 joker copy_e820_map() start: 0000000000000000 size: 000000000009fc00 end: 000000000009fc00 type: 1
Mar  7 23:16:41 joker copy_e820_map() type is E820_RAM
Mar  7 23:16:41 joker copy_e820_map() start: 000000000009fc00 size: 0000000000000400 end: 00000000000a0000 type: 2
Mar  7 23:16:41 joker copy_e820_map() start: 00000000000f0000 size: 0000000000010000 end: 0000000000100000 type: 2
Mar  7 23:16:41 joker copy_e820_map() start: 0000000000100000 size: 000000001fef0000 end: 000000001fff0000 type: 1
Mar  7 23:16:41 joker copy_e820_map() type is E820_RAM
Mar  7 23:16:41 joker copy_e820_map() start: 000000001fff0000 size: 0000000000008000 end: 000000001fff8000 type: 3
Mar  7 23:16:41 joker copy_e820_map() start: 000000001fff8000 size: 0000000000008000 end: 0000000020000000 type: 4
Mar  7 23:16:41 joker copy_e820_map() start: 00000000fec00000 size: 0000000000001000 end: 00000000fec01000 type: 2
Mar  7 23:16:41 joker copy_e820_map() start: 00000000fee00000 size: 0000000000001000 end: 00000000fee01000 type: 2
Mar  7 23:16:41 joker copy_e820_map() start: 00000000fff80000 size: 0000000000080000 end: 0000000100000000 type: 2
Mar  7 23:16:41 joker BIOS-e820: ...
From: Linus Torvalds
Date: Wednesday, March 7, 2007 - 5:21 pm

The non-working setup doesn't get any interrupts back, and thus doesn't 
see the ACK for the "\xd4\xed" command.

It really looks interrupt-related (especially considering that it goes 
away when you ask ACPI to not do certain things), but at the same time, 
the differences between -git6 and -git7 really don't seem to have *any* 
ACPI or PCI irq routing changes, so I think this really is related to the 
input-layer, and perhaps the real difference between ACPI irq routing and 
not is just the timing or IO acecss patterns that you get when you use the 
local apic vs the i8259 legacy irq controller.

For example, if there is a edge-triggered interrupt involved (and both 
keyboard *and* mouse are edge-triggered), the io-apic and the i8259 work 
differently: temporarily disabling the interrupt will reset the edge 
trigger logic on the i8259, but not on an IO-APIC.

So the lack of interrupts could be due to the input layer not clearing the 
interrupt source during setup, so some *old* interrupt just stays around, 
and because it's always set, on an IO-APIC it will never show as an edge 
at all - but on the i8259 the very action of registering the irq routine 
will create an edge.

There's some reason to believe that you may have a pending interrupt 

so there was certainly *something* unexpected there.

I'm not saying that's it, but it could explain why something that looks 
interrupt-related and that changes depending on whether you use ACPI to 
set up interrupts or not can have these kinds of reasons, that just depend 
on which interrupt controller the kernel happens to use, even though it's 
not "really" about lost interrupts at all, but just a driver that doesn't 
acknowledge a pending one.

Or something.

Doing the git bisect would really help.

		Linus
-

From: Ash Milsted
Date: Thursday, March 8, 2007 - 4:58 am

On Wed, 7 Mar 2007 23:49:14 +0000

Yup, that patch really hit the spot. Now I don't have to bisect :)

Thanks for your help (both of you),

Ash
-

Previous thread: [2.6.20.y PATCH 4/7] DVB: cxusb: fix firmware patch for big endian systems by Michael Krufky on Saturday, March 3, 2007 - 7:36 am. (1 message)

Next thread: [git patch] libata build fix by Jeff Garzik on Saturday, March 3, 2007 - 8:39 am. (1 message)