--Boundary-01=_z342IWcS2hda0oi
Content-Type: text/plain;
charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Hi,=09
in some accident caused by wanting to create the .config/compile the kernel=
=20
for my new laptop (thinkpad t500) before the desperately needed sleeping I=
=20
activated DMAR...
I don't know if this is relevant, but I though i better report it.
This was on fb478da5ba69ecf40729ae8ab37ca406b1e5be48 - sometime after 2.6.2=
7-
rc7
I stumbled over two buglets:
=46irst:
[ 4184.617392] DMAR:[DMA Read] Request device [03:00.0] fault addr fa946000=
=20
[ 4184.617393] DMAR:[fault reason 06] PTE Read access is not set
[ 4184.644081] iwlagn: Microcode HW error detected. Restarting.
[ 4186.646000] psmouse.c: TouchPad at isa0060/serio1/input0 lost=20
synchronization, throwing 1 bytes away.
[ 4186.683034] Registered led device: iwl-phy0:radio
[ 4186.683478] Registered led device: iwl-phy0:assoc
[ 4186.683793] Registered led device: iwl-phy0:RX
[ 4186.684094] Registered led device: iwl-phy0:TX
[ 4186.689749] wlan0: authenticate with AP 00:1d:7e:42:fe:42
[ 4186.691691] wlan0: authenticated
[ 4186.691705] wlan0: associate with AP 00:1d:7e:42:fe:42
[ 4186.696380] wlan0: RX ReassocResp from 00:1d:7e:42:fe:42 (capab=3D0x411=
=20
status=3D0 aid=3D2)
[ 4186.696392] wlan0: associated
Most of the time when this happened, the machine wasnt reacting for 1-3=20
seconds and had audio buffer underruns, but I also had a hard lockup which =
I=20
couldnt diagnose so far.
Second:
[ 2937.484251] DMAR:[DMA Read] Request device [00:1f.2] fault addr fffbf000=
=20
[ 2937.484255] DMAR:[fault reason 06] PTE Read access is not set
[ 2937.484297] ata1.00: exception Emask 0x60 SAct 0x1 SErr 0x800 action 0x6=
=20
frozen
[ 2937.484303] ata1.00: irq_stat 0x20000000, host bus error
[ 2937.484309] ata1: SError: { HostInt }
[ 2937.484319] ata1.00: cmd 61/08:00:c0:1d:6b/00:00:07:00:00/40 tag 0 ncq 4=
096=20
out
[ 2937.484321] res ...Ouch, a host bus error is serious nastiness... http://ata.wiki.kernel.org/index.php/Libata_error_messages#Error_classes That's the ATA controller falling over after some serious machine hiccups. Jeff --
Hi Jeff, On Friday 26 September 2008, you wrote in "Re: bad DMAR interaction with=20 I only hit that with DMAR activated (hit it twice, different boots), so it= =20 seems to be related to that. Is there anything I can help to debug that? Andres
No idea about DMAR. On the ATA side, it pretty diagnoses itself as you see here. Unfortunately, ATA controller is behaving exactly as it should, when a major system error is thrown its way. Jeff --
The way to debug this is to figure out why device 00:1f.2 is trying to read from DMA address fffbf000 and does not have permission to do so. This could be indicative of a driver bug where it is programming the device to read from some buffer that has not been allocated through the DMA API and thus does not have a valid IOMMU mapping, or a hardware quirk where the device tries to read from memory without host involvement. The former is much more likely. Cheers, Muli -- The First Workshop on I/O Virtualization (WIOV '08) Dec 2008, San Diego, CA, http://www.usenix.org/wiov08/ xxx SYSTOR 2009---The Israeli Experimental Systems Conference http://www.haifa.il.ibm.com/conferences/systor2009/ --
and indeed matches experience from myself and Marcel that DMA bugs seem to lurk. johannes
On Fri, Sep 26, 2008 at 6:12 PM, Johannes Berg Meanwhile it all reported bugs in this case points to 64 bit installations, I'll give it more testing Thnaks. Tomas --
Hi, On Saturday 27 September 2008, Tomas Winkler wrote in "Re: bad DMAR=20 Would it help to test on 32bit? I have some dissk with 32bit system install= ed=20 lying around somewhere... Any other patches to try? Andres
I've posted few patches lately to address some RX buffers issues you may to try those. Not sure it will help though. http://marc.info/?l=linux-wireless&m=122241327108723&w=2 http://marc.info/?l=linux-wireless&m=122241327208729&w=2 Thanks Tomas --
Andres, can you post your config? johannes
--Boundary-01=_kFi6IjGv/l+E+l2 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hi, On Monday 06 October 2008, Johannes Berg wrote in "Re: bad DMAR interaction= =20 Btw, will do so later this evening having access to the older harddisk (wit= h=20 Sure, my current running one is attached. The config I had the error with was exactly the same just with CONFIG_DMAR = and=20 e1000e enabled (but is overwritten now)... Its no problem trying another branch more debugging options or so if needed. Andres --Boundary-01=_kFi6IjGv/l+E+l2 Content-Type: text/plain; charset="iso-8859-15"; name="config-2.6.27-rc7" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="config-2.6.27-rc7" # # Automatically generated make config: don't edit # Linux kernel version: 2.6.27-rc7 # Tue Sep 30 15:47:39 2008 # CONFIG_64BIT=y # CONFIG_X86_32 is not set CONFIG_X86_64=y CONFIG_X86=y CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" # CONFIG_GENERIC_LOCKBREAK is not set CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_FAST_CMPXCHG_LOCAL=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y # CONFIG_GENERIC_GPIO is not set CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_GENERIC_SPINLOCK=y # CONFIG_RWSEM_XCHGADD_ALGORITHM is not set # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not ...
ed. No, I was just wondering whether x86-64 had something like powerpc's CONFIG_64K_PAGES, but it doesn't seem to. 2M-page support seems to be used always dependent on the CPU, but I have no idea you can tell whether or not your CPU supports that. johannes
2MB pages (and 4MB pages) are dependent on PSE/PAE, there's no configurable page size on x86 like there is on other platforms. PSE gives you 4MB pages, PAE reduces your 4MB pages to 2MB pages (for extra flag and address bits.) About the only useful places for these are large mappings like ioremap and whatnot. regards, Kyle --
Thanks for the explanation. Can you explain too why iwlwifi crashes when I enable 64k pages? ;) johannes
Thanks. I've also been chasing a DMA corruption issue with iwlagn (on I suspect the hard lockup was due to a BUG_ON in the iwlagn driver, if you can reproduce this either try applying the patch here [1] or going to a VC to see if it crashes there. It's a BUG_ON in iwl-tx.c. johannes [1] http://article.gmane.org/gmane.linux.kernel.wireless.general/21226
Hi, On Friday 26 September 2008, you wrote in "Re: bad DMAR interaction with=20 Could not reproduce so far - it is rather hard working on the machine with= =20 DMAR enabled because I get 1-5s lockups all the time like described above... Andres
