Re: Lenovo 3000 N100 i8042 problems

Previous thread: Misc fixes for 2.6.27 by David Woodhouse on Monday, September 1, 2008 - 1:14 pm. (17 messages)

Next thread: [path] don't diff generated firmware files by Arjan van de Ven on Monday, September 1, 2008 - 3:09 pm. (8 messages)
From: Daniel Barkalow
Date: Monday, September 1, 2008 - 2:46 pm

In 2.6.25.10, I'm finding that my i8042 seems to die after a while. In the 
middle of using the keyboard and mouse, generally before some key release 
is handled, it stops taking any input.

This seems to be due to 2a2dcd65e232eafd9fb6da1250f83adb57787b42; it works 
fine with that reverted. Perhaps that quirk is being applied too widely? 
Perhaps the workaround doesn't actually work on my computer? I couldn't 
find the bug report that led to that patch, so I'm not sure if I've been 
having whatever problem it was for all along and I never noticed, or if my 
3000 N100 is just different (Lenovo seems to have given a 
specific-sounding number to some very different hardware).

	-Daniel
*This .sig left intentionally blank*
--

From: Jiri Kosina
Date: Monday, September 1, 2008 - 4:29 pm

Hi Daniel,

thanks for tracking down the commit. Also, please don't forget to CC the 
commit author in such cases :)

Could you please send a dmidecode output from your system, so that we can 
compare it to the one provided by Christopher, as he as the system that 
apparently needs the nomux quirk to work correctly? It's indeed possible 
that there are various systems out there, and the DMI match has to be made 
more strict.

Thanks,

-- 
Jiri Kosina
SUSE Labs

--

From: Daniel Barkalow
Date: Monday, September 1, 2008 - 5:23 pm

Oh, right, the Author field. I confused myself by finding you as the 

Attached.

	-Daniel
*This .sig left intentionally blank*
From: Jiri Kosina
Date: Tuesday, September 2, 2008 - 2:23 am

Hmm, so you have

Handle 0x0001, DMI type 1, 27 bytes
System Information
        Manufacturer: LENOVO
        Product Name: 076836U
        Version: 3000 N100
        Serial Number: L3H0536
        UUID: 6747DA31-D471-11DA-901B-000FB0C9A0C9
        Wake-up Type: Power Switch
        SKU Number: Not Specified
        Family: Not Specified

Handle 0x0002, DMI type 2, 8 bytes
Base Board Information
        Manufacturer: LENOVO
        Product Name: MPAD-MSAE Customer Reference Boards
        Version: Not Applicable
        Serial Number: 41W1220Z1ZBUA6551DK

and the system on which Christopher reported originally this bug to me was

Handle 0x0001, DMI type 1, 27 bytes
System Information
        Manufacturer: LENOVO
        Product Name: 076804U
        Version: 3000 N100
        Serial Number: L3HX754
        UUID: DA02FA2F-A0AC-11DB-A093-000FB0D2560C
        Wake-up Type: Power Switch
        SKU Number: Intel
        Family: Lenovo

Handle 0x0002, DMI type 2, 8 bytes
Base Board Information
        Manufacturer: LENOVO
        Product Name: CAPELL VALLEY(NAPA) CRB
        Version: Not Applicable
        Serial Number: 41W8025Z1ZCZ971N36R                                      


so the product name both of System and Base Board are different, and 
apparently the systems differ. Dmitry, what fields would you propose to be 
put in the DMI matching here? I will do the patch then.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--

From: Henrique de Moraes Holschuh
Date: Tuesday, September 2, 2008 - 5:43 am

Are you sure you shouldn't be looking at BIOS version, instead? I don't know
how the Lenovo N100 series is, but chances are their i8042 is emulated
inside the ACPI EC, i.e. a firmware upgrade can change the i8042 behaviour.

I'd check BIOS versions, and ask people to upgrade to the latest, to see if
the problem goes away (or changes).  Then you will know for sure the best
approach.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--

From: Dmitry Torokhov
Date: Tuesday, September 2, 2008 - 5:51 am

Unfortunately we can't do "less than" type of comparison on DMI data,


Keeping compatibility with the other OS is the safest way so unless
Vista started using active MUX there is probably update to the BIOS
fixing legacy mux mode.

-- 
Dmitry
--

From: Dmitry Torokhov
Date: Tuesday, September 2, 2008 - 5:43 am

I guess we could use System's product name to differentiate between
Cristopher's and Daniel's boards. Although I must admit it is the very
first time when I see a box that behaves better with active mux. DOes
Vista use active mux nowadays? Because if it is not then I bet there
is (or shortly will be) a BIOS update fixing legacy mode on Daniel's
box.

-- 
Dmitry
--

From: Daniel Barkalow
Date: Tuesday, September 2, 2008 - 9:16 am

Mine's actually old (came with XP). It's still got the original BIOS 
(because I haven't found a way to upgrade the BIOS without reformatting my 
hard drive to include Windows), and I remember there being an upgrade 
available, but I don't think it had anything to do with keyboard/trackpad 
stuff.

In what way does active mux usually behave badly? It's possible that 
legacy mode only has a bug that doesn't matter to Windows, and active mux 
may have some of the usual problems but nothing I particularly noticed.

I noticed that, when my i8042 would stop working, it would generally have 
just delivered one mouse interrupt to CPU1 after never previously doing 
so. Perhaps there's some sort of deadlock in the Linux i8042 driver when 
both cores are unexpectedly getting interrupts from the two devices at 
once? I could understand there being a Linux bug only triggered by quirky 
hardware that only applies to legacy mode, which was just uncovered by 
this patch.

	-Daniel
*This .sig left intentionally blank*
--

From: Dmitry Torokhov
Date: Wednesday, September 3, 2008 - 7:26 am

It usually manifests with a touchpad/mouse missing because they don't
responf to kernel's queries. Quite a few Fujitsus exibit this

I am not sure, internally we the kernel still deals with 2 interrupt
sources (KBD and AUX) regardless whether it is in legacy or active
multiplexing mode...

Does it take long to trigger the bug? You coudl try doing "echo 1 >
/sys/modules/i8042/parameters/debug" and thend me dmesg or
/var/log/messages after the bug was triggered - I might see something
there. But please be aware that if you send me such a log I can decode
everything that you have been typing...

-- 
Dmitry
--

From: Daniel Barkalow
Date: Wednesday, September 3, 2008 - 10:16 am

Right from the beginning? I'm not seeing that on any kernel with this 
hardware. I don't suppose the kernel could detect that it's using active 
mux and one of the devices isn't responding, and use legacy mode in that 
case, and only use quirks for systems where the active mux does something 

It's usually within an hour of the right usage pattern. I'll try to 
trigger it with debugging on while not typing anything secret Thursday 
evening.

The other thing that might be useful, if there's some way to find out, is 
whether the kernel lost an interrupt somehow, since this feels like the 
hardware is waiting patiently for a lost interrupt to get serviced. Also, 
is there some way to get the kernel to re-initialize the i8042? It might 
be useful to see if the firmware has really stopped working or if the 
kernel is just failing to do anything further with it. I can unbind the 
driver, but I don't seem to be able to bind it again.

	-Daniel
*This .sig left intentionally blank*
--

From: Dmitry Torokhov
Date: Wednesday, September 3, 2008 - 12:06 pm

I understand. Jiri, how did active MUX problem manifest on

Well, we need a device to respond to our queries to figure out if it
is present or not ;) The box may not have any devices attached but


This is as close as it gets. Biding should cause the controller to be
flushed of any pending data and start afresh.

-- 
Dmitry
--

From: Jiri Kosina
Date: Wednesday, September 3, 2008 - 1:03 pm

If Christopher doesn't respond himself, I will dig it out from my 
archives, it has been quite some time already since this has been 
originally reported.

If I remember correctly, when 'nomux' wasn't used, psmouse used to 
complain a lot about losing synchronization and then the mouse pointer 
either went crazy or froze completely.

Christoph?

Thanks,

-- 
Jiri Kosina
SUSE Labs
--

From: Daniel Barkalow
Date: Thursday, September 4, 2008 - 5:05 pm

Attached. This has me typing some unimportant stuff, and then it sticks, 
then I plug in a USB keyboard, then I tried unbinding the i8042 and 
binding it again; the audio stuttered briefly, and recovered, and I did it 
again, and then saved this log.

	-Daniel
*This .sig left intentionally blank*
--

From: Dmitry Torokhov
Date: Thursday, September 4, 2008 - 5:46 pm

That is untruth :))

-- 
Dmitry
--

From: Daniel Barkalow
Date: Thursday, September 4, 2008 - 8:27 pm

Can I blame my keyboard? No, probably not, I sent that with the keyboard 
working. In any case, now it's attached for real.

	-Daniel
*This .sig left intentionally blank*
From: Jiri Kosina
Date: Wednesday, September 3, 2008 - 4:50 am

I guess so, yes.

On the other hand, this might also be viewed as regression (we made 
Daniel's hardware behave worse with recent kernel than it did before), so 
I think we still would like to have this fixed. What about the patch 
below, adding the match on System's product name, as you suggested? 
Thanks.


From: Jiri Kosina <jkosina@suse.cz>
Subject: [PATCH] Input: i8042 - make Lenovo 3000 N100 blacklist entry more specific

Apparently, there are more different versions of Lenovo 3000 N100, some
of them working properly with active mux, and some of them requiring it
being switched off.

This patch applies 'nomux' only to the specific product name that is
reported to behave badly unless 'nomux' is specified.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 drivers/input/serio/i8042-x86ia64io.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/input/serio/i8042-x86ia64io.h b/drivers/input/serio/i8042-x86ia64io.h
index 3282b74..5aafe24 100644
--- a/drivers/input/serio/i8042-x86ia64io.h
+++ b/drivers/input/serio/i8042-x86ia64io.h
@@ -305,7 +305,7 @@ static struct dmi_system_id __initdata i8042_dmi_nomux_table[] = {
 		.ident = "Lenovo 3000 n100",
 		.matches = {
 			DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
-			DMI_MATCH(DMI_PRODUCT_VERSION, "3000 N100"),
+			DMI_MATCH(DMI_PRODUCT_NAME, "076804U"),
 		},
 	},
 	{
-- 
1.5.4.5
--

From: Dmitry Torokhov
Date: Wednesday, September 3, 2008 - 7:20 am

I agree. Daniel, could you please try the patch to make sure it
restores the previous behavior for you and I will push it through.

Thanks!

-- 
Dmitry
--

From: Daniel Barkalow
Date: Wednesday, September 3, 2008 - 10:18 am

I'll test that Thursday as well; is there some quick way to determine 
whether you're using active mux or not?

	-Daniel
*This .sig left intentionally blank*
--

From: Dmitry Torokhov
Date: Wednesday, September 3, 2008 - 12:07 pm

Do "dmesg | grep serio".. If you see 4 AUX ports being created then
KBC is in active multiplexing mode.

-- 
Dmitry
--

From: Daniel Barkalow
Date: Thursday, September 4, 2008 - 4:57 pm

That patch, on top of 2.6.25.10 does give me 4 AUX ports, so I think it is 
getting the previous behavior as expected.

	-Daniel
*This .sig left intentionally blank*
--

From: Henrique de Moraes Holschuh
Date: Wednesday, September 3, 2008 - 2:32 pm

I fell I need to warn you guys that you are likely breaking machines that
match that DMI info but have a newer BIOS, unless they use different BIOSes
(not enough data without a full dmidecode output from the other machine).

But I really don't care either way, since this is not about ThinkPads :)

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--

From: Jiri Kosina
Date: Wednesday, September 3, 2008 - 2:36 pm

I would dare to say "breaking". Just using 'nomux' shouldn't really 
_break_ anything, unless the BIOS is somehow seriously hosed.

-- 
Jiri Kosina
SUSE Labs
--

From: Daniel Barkalow
Date: Wednesday, September 3, 2008 - 3:03 pm

The patch under consideration is to restore pre-2.6.25 behavior (i.e., 
active mux) for machines other than the one in a particular bug report, 
while 2.6.25 broke my machine. So this will probably rebreak machines that 
were broken until 2.6.25 (and can't break anything else). I think it would 
actually be better if we could apply the quirk to all models of 3000 N100 
except for mine (but I don't think quirk-matching supports that); my 
model is the only one we know of which came with a BIOS that has issues 
with legacy mode. I still think it's weird that Lenovo managed to break 
active mux when they'd had it working before, but who knows what's going 
on in their firmware development process.

In any case, I suspect that the legacy behavior on my machine is strange 
but manageable (given that Windows doesn't seem to have had problems using 
legacy mode even on my hardware, so far as I can tell), and we should be 
able to cope with it in general.

	-Daniel
*This .sig left intentionally blank*
--

From: Renato S. Yamane
Date: Monday, September 8, 2008 - 12:41 pm

dmidecode just from 3000-N100?
Attached a dmidecode from a Lenovo Thinkpad T61.
I have a Lenovo 3000-V200 too. You want a dmidecode from it?

Regards,
Renato
From: Daniel Barkalow
Date: Monday, September 8, 2008 - 12:55 pm

It's almost certainly only 3000 series that's interesting; I think they 
test the Thinkpads with Linux and wouldn't ship with a quirky BIOS there. 
The 3000 series only officially supports Windows, and so there can be 
problems (evidently, mine does something odd with the legacy mux, and 
newer ones do something odd with the active mux).

You might want to poke at the quirk in the patch in this thread and see if 
one or the other mode works better, or if they're the same on your 
machines. In any case, neither the patch that got into 2.6.25 nor the 
narrowing patch in this thread would affect either of your machines.

	-Daniel
*This .sig left intentionally blank*
--

From: Renato S. Yamane
Date: Monday, September 8, 2008 - 8:35 pm

Here is the dmidecode from a Lenovo 3000-V200.
I hope this help something.
Let me know if you need more info.

Best regards,
Renato S. Yamane
From: Henrique de Moraes Holschuh
Date: Monday, September 8, 2008 - 8:42 pm

Well, does that box suffer either of the issues (breaks either with the
patch or without the patch)?


-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--

From: Renato S. Yamane
Date: Tuesday, September 9, 2008 - 6:37 am

I use 2.6.26 Kernel available in Debian Lenny and don't have any problem.

Best regards,
Renato
--

From: Henrique de Moraes Holschuh
Date: Monday, September 8, 2008 - 1:24 pm

No, we'd need the dmidecode output of the two *specific* 3000-N100
machines involved in the issue, so that we can know the specific BIOS
version they are running.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--

Previous thread: Misc fixes for 2.6.27 by David Woodhouse on Monday, September 1, 2008 - 1:14 pm. (17 messages)

Next thread: [path] don't diff generated firmware files by Arjan van de Ven on Monday, September 1, 2008 - 3:09 pm. (8 messages)