Re: 2.6.25-git16 -- BUG: unable to handle kernel NULL pointer dereference at 00000000 -- IP: [<c02dd5d5>] fw_show_drv_device_ids+0xd9/0xee

Previous thread: [PATCH] PNP: fix missing kernel-doc notation by Randy Dunlap on Wednesday, April 30, 2008 - 5:18 pm. (2 messages)

Next thread: linux-next build error by Daniel Hazelton on Wednesday, April 30, 2008 - 5:50 pm. (2 messages)

BUG: unable to handle kernel NULL pointer dereference at 00000000
IP: [&lt;c02dd5d5&gt;] fw_show_drv_device_ids+0xd9/0xee
*pde = 00000000
Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
Modules linked in: aes_generic sbp2 snd_intel8x0 snd_ac97_codec arc4
ac97_bus snd_pcm_oss snd_mixer_oss parport_pc parport ecb
crypto_blkcipher snd_pcm cryptomgr crypto_algapi snd_seq_dummy
snd_seq_oss snd_seq_midi_event b43 snd_seq mac80211 snd_timer
snd_seq_device cfg80211 led_class snd snd_page_alloc i2c_nforce2
i2c_core video1394

Pid: 5535, comm: head Not tainted (2.6.25-git16 #6)
EIP: 0060:[&lt;c02dd5d5&gt;] EFLAGS: 00210206 CPU: 0
EIP is at fw_show_drv_device_ids+0xd9/0xee
EAX: c04f4f4c EBX: 00000018 ECX: f0bba000 EDX: f0bba000
ESI: 00000000 EDI: f7bbe268 EBP: f0b17f3c ESP: f0b17f2c
 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process head (pid: 5535, ti=f0b17000 task=f4805e80 task.ti=f0b17000)
Stack: f0bba000 f0bba000 f7bfb2d0 f7bbe268 f0b17f4c c028895f f5220e70 c04ef508
       f0b17f74 c01a62bb 00002000 bfd261a7 f5220e84 c04ef508 f7bfb2d0 00002000
       f45aee00 bfd261a7 f0b17f90 c0172807 f0b17f9c c01a622a f45aee00 fffffff7
Call Trace:
 [&lt;c028895f&gt;] ? drv_attr_show+0x1f/0x23
 [&lt;c01a62bb&gt;] ? sysfs_read_file+0x91/0xee
 [&lt;c0172807&gt;] ? vfs_read+0x78/0xa8
 [&lt;c01a622a&gt;] ? sysfs_read_file+0x0/0xee
 [&lt;c0172b2e&gt;] ? sys_read+0x3b/0x5d
 [&lt;c0105e4b&gt;] ? sysenter_past_esp+0x78/0xd1
 =======================
Code: 01 b8 3a e5 48 c0 0f 44 c2 50 68 f5 ed 49 c0 51 e8 a8 5c f3 ff
8b 4d f0 01 c6 83 c4 10 01 f1 85 ff 74 05 c6 01 0a 46 41 83 c3 18 &lt;8b&gt;
43 e8 85 c0 0f 85 37 ff ff ff 8d 65 f4 89 f0 5b 5e 5f c9 c3

This issue seems to have to do with the following problem:

ohci1394: fw-host0: SelfID received outside of bus reset sequence
IEEE 1394 device has ROM CRC error
IEEE 1394 device has ROM CRC error
Program usplash tried to access /dev/mem between 0-&gt;8000000.
ieee1394: Node added: ID:BUS[0-00:1023]  GUID[00d0f54000000173]
ieee1394: Host added: ID:BUS[0-01:1023]  GUID[00e018000027d06c]
IEEE 1394 device has ROM CRC ...

[...]

Is everything alright again with this patch?

From: Stefan Richter &lt;stefanr@s5r6.in-berlin.de&gt;
Subject: ieee1394: fix NULL pointer dereference in sysfs access

Regression since &quot;ieee1394: prevent device binding of raw1394,
video1394, dv1394&quot;, commit d2ace29fa44589da51fedc06a67b3f05301f3bfd:
$ cat /sys/bus/ieee1394/drivers/raw1394/device_ids
triggers a NULL pointer dereference in fw_show_drv_device_ids.

Reported-by: Miles Lane &lt;miles.lane@gmail.com&gt;
Signed-off-by: Stefan Richter &lt;stefanr@s5r6.in-berlin.de&gt;
---
 drivers/ieee1394/nodemgr.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Index: linux/drivers/ieee1394/nodemgr.c
===================================================================
--- linux.orig/drivers/ieee1394/nodemgr.c
+++ linux/drivers/ieee1394/nodemgr.c
@@ -520,8 +520,11 @@ static ssize_t fw_show_drv_device_ids(st
 	char *scratch = buf;
 
 	driver = container_of(drv, struct hpsb_protocol_driver, driver);
+	id = driver-&gt;id_table;
+	if (!id)
+		return 0;
 
-	for (id = driver-&gt;id_table; id-&gt;match_flags != 0; id++) {
+	for (; id-&gt;match_flags != 0; id++) {
 		int need_coma = 0;
 
 		if (id-&gt;match_flags &amp; IEEE1394_MATCH_VENDOR_ID) {

-- 
Stefan Richter
-=====-==--- -=-= ----=
http://arcgraph.de/sr/

--


Hi Stefan.  Thanks for your patch.  It worked great.  Now I see this
in the logs:

ohci1394: fw-host0: SelfID received outside of bus reset sequence
IEEE 1394 device has ROM CRC error
IEEE 1394 device has ROM CRC error
Program usplash tried to access /dev/mem between 0-&gt;8000000.
ieee1394: Node added: ID:BUS[0-00:1023]  GUID[00d0f54000000173]
ieee1394: Host added: ID:BUS[0-01:1023]  GUID[00e018000027d06c]
IEEE 1394 device has ROM CRC error
IEEE 1394 device has ROM CRC error
IEEE 1394 device has ROM CRC error
IEEE 1394 device has ROM CRC error
IEEE 1394 device has ROM CRC error

Also, &quot;cat /sys/bus/ieee1394/drivers/raw1394/device_ids&quot; no longer
causes an error.

Is it necessary for the code to spit out so many CRC errors for the
OrangeMicro iBot?  Also, are these errors indicating a significant
problem with the iBot?

Cheers,
     Miles
--


I could look into keeping those messages down to only a single one per 

The config ROM is an area in the IEEE 1394 address space of a node which 
contains information about the device functions.  It consists of a 
header, a root directory, and subdirectories.  Each one of those has 
CRCs as one safeguard (among others) against integrity issues with the 
data read from the ROM.

A few devices, especially older ones, got the CRC algorithm wrong. Hence 
we don't reject those devices, we only warn about them as a starting 
point for further diagnosis in case that more than this goes wrong when 
using the device.

(The ROM CRC error warning has been added in 2.6.22.  Before Linux 
2.6.22, the ROM CRC errors were silently accepted.  The new alternative 
firewire drivers also do not warn about ROM CRC errors... yet...)
-- 
Stefan Richter
-=====-==--- -=-= ----=
http://arcgraph.de/sr/
--


I like to be warned about things, even if there isn't anything I can 
really do about it (fix my IEEE 1394 ROM CRC? :) But there doesn't seem 
to be any point to repeating the message.

Do the repeats come because it prints each time the config ROM is accessed?

Cheers,
Dan

-- 
                     /--------------- - -  -  -   -   -
                     |  Dan Noé
                     |  http://isomerica.net/~dpn/
--


There are three reasons which cause repetition:

   - While we fetch and parse the config ROM, another bus resets happens.
     We then need to start over reading the config ROM.  This kind of
     repetition cannot easily be prevented, but hopefully happens less
     often.

   - A variation of the theme:  Device plugged out, plugged in again.
     Causes the config ROM to be at least partially fetched and parsed
     again.

   - If the firmware author did the CRC algorithm wrong, he got it wrong
     for the bus information block, the root directory, and each
     subdirectory or leaf (unit directories, instance directories,
     textual descriptor leaves, icon descriptors...).  Giving a log
     notice about each of these CRCs is surely redundant.

When I added the log notice, I didn't try to suppress at least the 
latter kind of repetition because I figured that this kind of mistake is 
rare nowadays.  (If I'm not mistaken, there have for example been 
clarifications in IEEE 1212-2001.)  Of course it isn't rare to those 
people who happen to work with affected devices all the time.  So I will 
try to reduce the log spam.

BTW, &quot;fix my IEEE 1394 ROM CRC&quot; can sometimes be done by firmware update.
-- 
Stefan Richter
-=====-==--- -=-= ----=
http://arcgraph.de/sr/
--

From: Stefan Richter
Date: Friday, May 2, 2008 - 11:14 am

This avoids redundant messages about a special and usually harmless
firmware flaw.

Signed-off-by: Stefan Richter &lt;stefanr@s5r6.in-berlin.de&gt;
---
 drivers/ieee1394/csr1212.c |   32 ++++++++++++++++++++++----------
 1 file changed, 22 insertions(+), 10 deletions(-)

Index: linux/drivers/ieee1394/csr1212.c
===================================================================
--- linux.orig/drivers/ieee1394/csr1212.c
+++ linux/drivers/ieee1394/csr1212.c
@@ -1049,6 +1049,24 @@ int csr1212_read(struct csr1212_csr *csr
 	return -ENOENT;
 }
 
+/*
+ * Apparently there are many different wrong implementations of the CRC
+ * algorithm.  We don't fail, we just warn... approximately once per GUID.
+ */
+static void
+csr1212_check_crc(const u32 *buffer, size_t length, u16 crc, __be32 *guid)
+{
+	static u64 last_bad_eui64;
+	u64 eui64 = ((u64)be32_to_cpu(guid[0]) &lt;&lt; 32) | be32_to_cpu(guid[1]);
+
+	if (csr1212_crc16(buffer, length) == crc ||
+	    csr1212_msft_crc16(buffer, length) == crc ||
+	    eui64 == last_bad_eui64)
+		return;
+
+	printk(KERN_DEBUG &quot;ieee1394: config ROM CRC error\n&quot;);
+	last_bad_eui64 = eui64;
+}
 
 /* Parse a chunk of data as a Config ROM */
 
@@ -1092,11 +1110,8 @@ static int csr1212_parse_bus_info_block(
 			return ret;
 	}
 
-	/* Apparently there are many different wrong implementations of the CRC
-	 * algorithm.  We don't fail, we just warn. */
-	if ((csr1212_crc16(bi-&gt;data, bi-&gt;crc_length) != bi-&gt;crc) &amp;&amp;
-	    (csr1212_msft_crc16(bi-&gt;data, bi-&gt;crc_length) != bi-&gt;crc))
-		printk(KERN_DEBUG &quot;IEEE 1394 device has ROM CRC error\n&quot;);
+	csr1212_check_crc(bi-&gt;data, bi-&gt;crc_length, bi-&gt;crc,
+			  &amp;csr-&gt;bus_info_data[3]);
 
 	cr = CSR1212_MALLOC(sizeof(*cr));
 	if (!cr)
@@ -1205,11 +1220,8 @@ int csr1212_parse_keyval(struct csr1212_
 		&amp;cache-&gt;data[bytes_to_quads(kv-&gt;offset - cache-&gt;offset)];
 	kvi_len = be16_to_cpu(kvi-&gt;length);
 
-	/* Apparently there are many different wrong implementations of the CRC
-	 * algorithm.  We don't fail, we just ...
Previous thread: [PATCH] PNP: fix missing kernel-doc notation by Randy Dunlap on Wednesday, April 30, 2008 - 5:18 pm. (2 messages)

Next thread: linux-next build error by Daniel Hazelton on Wednesday, April 30, 2008 - 5:50 pm. (2 messages)