Re: USB: FIx locks and urb->status in adutux

Previous thread: [PATCH -v2 5/7] pull RT tasks by Steven Rostedt on Monday, October 22, 2007 - 7:59 pm. (1 message)

Next thread: [PATCH -v2 2/7] track highest prio queued on runqueue by Steven Rostedt on Monday, October 22, 2007 - 7:59 pm. (1 message)
From: Pete Zaitcev
Date: Monday, October 22, 2007 - 8:34 pm

Two main issues fixed here are:
 - An improper use of in-struct lock to protect an open count
 - Use of urb status for -EINPROGRESS

Also, along the way:
 - Change usb_unlink_urb to usb_kill_urb. Apparently there's no need
   to use usb_unlink_urb whatsoever in this driver, and the old use of
   usb_kill_urb was outright racy (it unlinked and immediately freed).
 - Fix indentation in adu_write. Looks like it was damaged by a script.

Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>

---

If someone with a real device tested this, it would be great. Please
make sure to pull the plug while application is opening/closing.

A critical review is welcome too.

Vitaliy, this is the static lock example, which I promised on Friday.

diff --git a/drivers/usb/misc/adutux.c b/drivers/usb/misc/adutux.c
index c567aa7..a8afb66 100644
--- a/drivers/usb/misc/adutux.c
+++ b/drivers/usb/misc/adutux.c
@@ -79,12 +79,22 @@ MODULE_DEVICE_TABLE(usb, device_table);
 
 #define COMMAND_TIMEOUT	(2*HZ)	/* 60 second timeout for a command */
 
+/*
+ * The locking scheme is a vanilla 3-lock:
+ *   adu_device.buflock: A spinlock, covers what IRQs touch.
+ *   adutux_mutex:       A Static lock to cover open_count. It would also cover
+ *                       any globals, but we don't have them in 2.6.
+ *   adu_device.mtx:     A mutex to hold across sleepers like copy_from_user.
+ *                       It covers all of adu_device, except the open_count
+ *                       and what .buflock covers.
+ */
+
 /* Structure to hold all of our device specific stuff */
 struct adu_device {
-	struct mutex		mtx; /* locks this structure */
+	struct mutex		mtx;
 	struct usb_device*	udev; /* save off the usb device pointer */
 	struct usb_interface*	interface;
-	unsigned char		minor; /* the starting minor number for this device */
+	unsigned int		minor; /* the starting minor number for this device */
 	char			serial_number[8];
 
 	int			open_count; /* number of times this port has been opened ...
From: Oliver Neukum
Date: Tuesday, October 23, 2007 - 2:38 am

This leaves a race here:

	while (count > 0) {
		spin_lock_irqsave(&dev->buflock, flags);
		if (!dev->out_urb_finished) {
			spin_unlock_irqrestore(&dev->buflock, flags);

			timeout = COMMAND_TIMEOUT;
			while (timeout > 0) {
				if (signal_pending(current)) {
					dbg(1," %s : interrupted", __FUNCTION__);
					retval = -EINTR;
					goto exit;
				}
				mutex_unlock(&dev->mtx);
				timeout = interruptible_sleep_on_timeout(&dev->write_wait, timeout);

You can detect that the urb has not finished but the notification happens before

This is racy:
	dev = usb_get_intfdata(interface);
	if (!dev) {
		retval = -ENODEV;
		goto exit_no_device;
	}

	if ((retval = mutex_lock_interruptible(&adutux_mutex))) {
		dbg(2, "%s : mutex lock failed", __FUNCTION__);
		goto exit_no_device;
	}

You need to manipulate intfdata under lock, or disconnect will

You should set file->private_data to NULL in the error case.


I find it a bit silly to bother with _irqsave if you call schedule_timeout() in the
next line.

	Regards
		Oliver
-

From: Pete Zaitcev
Date: Tuesday, October 23, 2007 - 2:38 pm

Is it always safe to kill unfilled URBs? The filled but unsubmitted ones
are ok, but in this case it's possible that we only allocated something
but never submitted. Our current implementation happens to be safe by
virtue of ->dev being NULL in such case. I do not remember if we always
guaranteed that and since Vitaly is going to take this code for a
backport, I decided to play it safe.


That's a common issue with sleep_on and its derivatives really.
I would need to replace it with explicit queue adds to fix.



What for? Nobody ever tests it or dereferences it after ->open returned

True, this is not really necessary... And I set and clear such flags
without locking around them when I'm sure that URB is not in progress.
I just added it in case someone wants to expand this code by testing
two things together or something, because here the callback certainly
can strike at any time. The whole excercise started because Vitaliy
wanted to reuse the code.

-- Pete
-

From: Oliver Neukum
Date: Wednesday, October 24, 2007 - 7:04 am

I am not sure as far as 2.4 is concerned. In fact I am not sure 2.4 has
usb_kill_urb() at all.

	Regards
		Oliver

-

From: Pete Zaitcev
Date: Tuesday, October 23, 2007 - 6:53 pm

Two main issues fixed here are:
 - An improper use of in-struct lock to protect an open count
 - Use of urb status for -EINPROGRESS

Also, along the way:
 - Change usb_unlink_urb to usb_kill_urb. Apparently there's no need
   to use usb_unlink_urb whatsoever in this driver, and the old use of
   usb_kill_urb was outright racy (it unlinked and immediately freed).
 - Fix indentation in adu_write. Looks like it was damaged by a script.
 - Remove sleep_on.

Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>

---

Oliver, thanks for the inftdata catch. I also fixed the sleep_on.
But I left the silly dances around usb_kill_urb in, at least for now.

diff --git a/drivers/usb/misc/adutux.c b/drivers/usb/misc/adutux.c
index c567aa7..d278161 100644
--- a/drivers/usb/misc/adutux.c
+++ b/drivers/usb/misc/adutux.c
@@ -79,12 +79,22 @@ MODULE_DEVICE_TABLE(usb, device_table);
 
 #define COMMAND_TIMEOUT	(2*HZ)	/* 60 second timeout for a command */
 
+/*
+ * The locking scheme is a vanilla 3-lock:
+ *   adu_device.buflock: A spinlock, covers what IRQs touch.
+ *   adutux_mutex:       A Static lock to cover open_count. It would also cover
+ *                       any globals, but we don't have them in 2.6.
+ *   adu_device.mtx:     A mutex to hold across sleepers like copy_from_user.
+ *                       It covers all of adu_device, except the open_count
+ *                       and what .buflock covers.
+ */
+
 /* Structure to hold all of our device specific stuff */
 struct adu_device {
-	struct mutex		mtx; /* locks this structure */
+	struct mutex		mtx;
 	struct usb_device*	udev; /* save off the usb device pointer */
 	struct usb_interface*	interface;
-	unsigned char		minor; /* the starting minor number for this device */
+	unsigned int		minor; /* the starting minor number for this device */
 	char			serial_number[8];
 
 	int			open_count; /* number of times this port has been opened */
@@ -107,8 +117,11 @@ struct adu_device {
 	char*			interrupt_out_buffer;
 	struct ...
From: Vitaliy Ivanov
Date: Wednesday, October 24, 2007 - 7:09 am

Pete,

On Wed, 2007-10-24 at 04:53, Pete Zaitcev wrote:


Don't you think it's needed? You call adu_abort_transfers from adu_release only where it's locked by adutux_mutex.


2)
Also one issue. adu_write has:

	/* send off the urb */
	usb_fill_int_urb(
		dev->interrupt_out_urb,
		dev->udev,
		usb_sndintpipe(dev->udev, dev->interrupt_out_endpoint->bEndpointAddress),
		dev->interrupt_out_buffer,
		bytes_to_write,
		adu_interrupt_out_callback,
		dev,
		dev->interrupt_in_endpoint->bInterval);

Seems like there should be not:
dev->interrupt_in_endpoint->bInterval
but:
dev->interrupt_out_endpoint->bInterval

Typo? Seems like that.




I just tried your patch on the adutux from 2.6.24-rc1 but on my 2.6.20.7. It failed with the following message:

-----------------
[176188.466898] drivers/usb/misc/adutux.c : adu_open : enter 
[176188.468000] drivers/usb/misc/adutux.c : adu_open : open count 1 
[176188.468601] drivers/usb/misc/adutux.c : adu_open : leave, return value 0  
[176188.486218] drivers/usb/misc/adutux.c :  adu_interrupt_in_callback : enter, status 0 
[176188.486269] drivers/usb/misc/adutux.c: adu_interrupt_in_callback - length = 8, data = 00 00 00 00 00 00 00 00 
[176188.486430] drivers/usb/misc/adutux.c: adu_interrupt_in_callback - length = 8, data = 00 00 00 00 00 00 00 00 
[176188.486482] drivers/usb/misc/adutux.c :  adu_interrupt_in_callback : leave, status 0 
[176191.303853] drivers/usb/misc/adutux.c :  adu_write : enter, count = 8 
[176193.301872] drivers/usb/misc/adutux.c : adu_write - command timed out. 
[176193.301912] drivers/usb/misc/adutux.c :  adu_write : leave, return value -110 
[176200.425157] drivers/usb/misc/adutux.c :  adu_write : enter, count = 8 
[176200.425776] list_add corruption. next->prev should be prev (c15e6c30), but was 00000000. (next=c58bdf5c).
[176200.425776] list_add corruption. next->prev should be prev (c15e6c30), but was 00000000. (next=c58bdf5c).
[176200.426475] ------------[ cut here ]------------
[176200.426475] ...
From: Pete Zaitcev
Date: Wednesday, October 24, 2007 - 8:20 pm

Yes, it's not needed anymore. It's a carry-over from the time

Looks like one. It actually is a risky thing to fix... What if the device
as broken descriptors and the driver worked only thanks to the bug above?

This is probably what Oliver caught, I did not have all removals

This looks like a debugging-caused oops. It should go away by itself

Please try attached on any of those.

Thank you,
-- Pete

diff --git a/drivers/usb/misc/adutux.c b/drivers/usb/misc/adutux.c
index c567aa7..bea6262 100644
--- a/drivers/usb/misc/adutux.c
+++ b/drivers/usb/misc/adutux.c
@@ -79,12 +79,22 @@ MODULE_DEVICE_TABLE(usb, device_table);
 
 #define COMMAND_TIMEOUT	(2*HZ)	/* 60 second timeout for a command */
 
+/*
+ * The locking scheme is a vanilla 3-lock:
+ *   adu_device.buflock: A spinlock, covers what IRQs touch.
+ *   adutux_mutex:       A Static lock to cover open_count. It would also cover
+ *                       any globals, but we don't have them in 2.6.
+ *   adu_device.mtx:     A mutex to hold across sleepers like copy_from_user.
+ *                       It covers all of adu_device, except the open_count
+ *                       and what .buflock covers.
+ */
+
 /* Structure to hold all of our device specific stuff */
 struct adu_device {
-	struct mutex		mtx; /* locks this structure */
+	struct mutex		mtx;
 	struct usb_device*	udev; /* save off the usb device pointer */
 	struct usb_interface*	interface;
-	unsigned char		minor; /* the starting minor number for this device */
+	unsigned int		minor; /* the starting minor number for this device */
 	char			serial_number[8];
 
 	int			open_count; /* number of times this port has been opened */
@@ -107,8 +117,11 @@ struct adu_device {
 	char*			interrupt_out_buffer;
 	struct usb_endpoint_descriptor* interrupt_out_endpoint;
 	struct urb*		interrupt_out_urb;
+	int			out_urb_finished;
 };
 
+static DEFINE_MUTEX(adutux_mutex);
+
 static struct usb_driver adu_driver;
 
 static void adu_debug_data(int level, const ...
From: Pete Zaitcev
Date: Wednesday, October 24, 2007 - 8:25 pm

BTW, the patch in my previous message had one buglet - an extra
remove_wait_queue. It was in an error case though, so it should be
ok to test. But just in case, here's a fixed one.

-- Pete

diff --git a/drivers/usb/misc/adutux.c b/drivers/usb/misc/adutux.c
index c567aa7..6914c0b 100644
--- a/drivers/usb/misc/adutux.c
+++ b/drivers/usb/misc/adutux.c
@@ -79,12 +79,22 @@ MODULE_DEVICE_TABLE(usb, device_table);
 
 #define COMMAND_TIMEOUT	(2*HZ)	/* 60 second timeout for a command */
 
+/*
+ * The locking scheme is a vanilla 3-lock:
+ *   adu_device.buflock: A spinlock, covers what IRQs touch.
+ *   adutux_mutex:       A Static lock to cover open_count. It would also cover
+ *                       any globals, but we don't have them in 2.6.
+ *   adu_device.mtx:     A mutex to hold across sleepers like copy_from_user.
+ *                       It covers all of adu_device, except the open_count
+ *                       and what .buflock covers.
+ */
+
 /* Structure to hold all of our device specific stuff */
 struct adu_device {
-	struct mutex		mtx; /* locks this structure */
+	struct mutex		mtx;
 	struct usb_device*	udev; /* save off the usb device pointer */
 	struct usb_interface*	interface;
-	unsigned char		minor; /* the starting minor number for this device */
+	unsigned int		minor; /* the starting minor number for this device */
 	char			serial_number[8];
 
 	int			open_count; /* number of times this port has been opened */
@@ -107,8 +117,11 @@ struct adu_device {
 	char*			interrupt_out_buffer;
 	struct usb_endpoint_descriptor* interrupt_out_endpoint;
 	struct urb*		interrupt_out_urb;
+	int			out_urb_finished;
 };
 
+static DEFINE_MUTEX(adutux_mutex);
+
 static struct usb_driver adu_driver;
 
 static void adu_debug_data(int level, const char *function, int size,
@@ -132,27 +145,31 @@ static void adu_debug_data(int level, const char *function, int size,
  */
 static void adu_abort_transfers(struct adu_device *dev)
 {
-	dbg(2," %s : enter", ...
From: Vitaliy Ivanov
Date: Monday, October 29, 2007 - 11:04 am

Finally had a time on my weekend to perform tests and fix Pete's patch a little. Now it seems to be correct.

Added a few changes:

1. One device per user space application. Old driver allowed many users application to work with same device which can lead to IO mess.
2. GFP_KERNEL is used instead of GFP_ATOMIC as we are now out of spin locks.
3. Added myself to maintainers(already discussed this in 2.4). Does anyone mind?

(Diff between this latest patch and the one from Pete is in attach).

Also part of Pete's notes:

Two main issues fixed here are:
 - An improper use of in-struct lock to protect an open count
 - Use of urb status for -EINPROGRESS

Also, along the way:
 - Change usb_unlink_urb to usb_kill_urb. Apparently there's no need
   to use usb_unlink_urb whatsoever in this driver, and the old use of
   usb_kill_urb was outright racy (it unlinked and immediately freed).
 - Fix indentation in adu_write. Looks like it was damaged by a script.
 - Remove sleep_on.


Here is the final patch. Comments are welcomed.

--

Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>

Tested-by: Vitaliy Ivanov <vitalivanov@gmail.com>

--

diff -uprN -X linux-2.6.24-rc1.org/Documentation/dontdiff linux-2.6.24-rc1.org/drivers/usb/misc/adutux.c linux-2.6.24-rc1.my/drivers/usb/misc/adutux.c
--- linux-2.6.24-rc1.org/drivers/usb/misc/adutux.c	2007-10-29 19:25:00.000000000 +0200
+++ linux-2.6.24-rc1.my/drivers/usb/misc/adutux.c	2007-10-29 19:01:10.000000000 +0200
@@ -79,12 +79,22 @@ MODULE_DEVICE_TABLE(usb, device_table);
 
 #define COMMAND_TIMEOUT	(2*HZ)	/* 60 second timeout for a command */
 
+/*
+ * The locking scheme is a vanilla 3-lock:
+ *   adu_device.buflock: A spinlock, covers what IRQs touch.
+ *   adutux_mutex:       A Static lock to cover open_count. It would also cover
+ *                       any globals, but we don't have them in 2.6.
+ *   adu_device.mtx:     A mutex to hold across sleepers like copy_from_user.
+ ...
From: Pete Zaitcev
Date: Monday, October 29, 2007 - 9:24 pm

OK. This trick was popular in UNIX. Personally I think it's in a bad
taste, because good applications still need to verify if only one
instance is running, and threfore can use application level locking.
But if you are gunning for the maintenership I'm not going to argue
your style. The busy lock-out certainly works better than "/dev/cua" :-)


The finished flag is only set when URB is not in use anymore. Did you
observe an anomaly with my code? Any hangs? If so, I assure you this
is not the fix. As it's written, even if we ignore the failure (e.g.
do not pass it to userland), we sill have to maintain the correct
flag state.

-- Pete
-

From: Vitaliy Ivanov
Date: Tuesday, October 30, 2007 - 6:09 am

As about read_urb_finished probably it's OK. But we shouldn't decrease open_count in the case of error as then we return normal exit value.
Here is what we had before:

				 dev->interrupt_in_endpoint->bInterval);
		dev->read_urb_finished = 0;
		retval = usb_submit_urb(dev->interrupt_in_urb, GFP_KERNEL);
		if (retval) {
			dev->read_urb_finished = 1;
			--dev->open_count;
		}

So I can left it but w/o this line:
--dev->open_count;

What is more critical is that I added:

	/* initialize out direction */
	dev->out_urb_finished = 1;

Without this we'll always have write timeouts.

		Vitaliy

-

From: Pete Zaitcev
Date: Tuesday, October 30, 2007 - 2:54 pm

Oh, thanks. I fixed that.

One other small thing. I see you dropped the dev->mtx bracket that
I added around the initialization and submission. Notice that the
dev->udev is zeroed under dev->mtx, not the static lock. This is because
it has to be seen in read and write paths. If you don't like taking
across the submission, how about testing for it ahead of time?

Here's what I've got now:

diff --git a/drivers/usb/misc/adutux.c b/drivers/usb/misc/adutux.c
index c567aa7..5a2c44e 100644
--- a/drivers/usb/misc/adutux.c
+++ b/drivers/usb/misc/adutux.c
@@ -79,12 +79,22 @@ MODULE_DEVICE_TABLE(usb, device_table);
 
 #define COMMAND_TIMEOUT	(2*HZ)	/* 60 second timeout for a command */
 
+/*
+ * The locking scheme is a vanilla 3-lock:
+ *   adu_device.buflock: A spinlock, covers what IRQs touch.
+ *   adutux_mutex:       A Static lock to cover open_count. It would also cover
+ *                       any globals, but we don't have them in 2.6.
+ *   adu_device.mtx:     A mutex to hold across sleepers like copy_from_user.
+ *                       It covers all of adu_device, except the open_count
+ *                       and what .buflock covers.
+ */
+
 /* Structure to hold all of our device specific stuff */
 struct adu_device {
-	struct mutex		mtx; /* locks this structure */
+	struct mutex		mtx;
 	struct usb_device*	udev; /* save off the usb device pointer */
 	struct usb_interface*	interface;
-	unsigned char		minor; /* the starting minor number for this device */
+	unsigned int		minor; /* the starting minor number for this device */
 	char			serial_number[8];
 
 	int			open_count; /* number of times this port has been opened */
@@ -107,8 +117,11 @@ struct adu_device {
 	char*			interrupt_out_buffer;
 	struct usb_endpoint_descriptor* interrupt_out_endpoint;
 	struct urb*		interrupt_out_urb;
+	int			out_urb_finished;
 };
 
+static DEFINE_MUTEX(adutux_mutex);
+
 static struct usb_driver adu_driver;
 
 static void adu_debug_data(int level, const char *function, int ...
From: Vitaliy Ivanov
Date: Wednesday, October 31, 2007 - 4:54 am

[Empty message]
From: Pete Zaitcev
Date: Wednesday, October 31, 2007 - 3:01 pm

The paragraph you quoted above explains why dev->udev cannot be managed
under the static lock: because dev->udev is accessed by read/write methods,

No, I meant testing before the rest of the ->open method is executed,
sorry. This part is "ahead of" the rest:

@@ -267,54 +290,54 @@ static int adu_open(struct inode *inode, struct file *file)
 	}
 
 	dev = usb_get_intfdata(interface);
-	if (!dev) {
+	if (!dev || !dev->udev) {
 		retval = -ENODEV;
 		goto exit_no_device;
 	}


Very well, I'll post this for Greg anew today.

Do you still want to go ahead with a 2.4 backport?

-- Pete
-

From: Vitaliy Ivanov
Date: Thursday, November 1, 2007 - 2:06 am

Yep. Will do it asap. In latest 2.4 patch we had just a locks issue.
So do you think I should modify 2.4 patch with all modifications we done in 2.6 version included?

Vitaliy

-

From: Pete Zaitcev
Date: Thursday, November 1, 2007 - 10:28 am

Something like that, yes.

-- Pete
-

From: Oliver Neukum
Date: Wednesday, October 24, 2007 - 7:49 am

But you are leaving a function while still on a waitqueue left on the stack.
Here's a patch on top of yours.

	Regards
		Oliver

----

--- work/drivers/usb/misc/adutux.c.alt	2007-10-24 16:36:02.000000000 +0200
+++ work/drivers/usb/misc/adutux.c	2007-10-24 16:36:06.000000000 +0200
@@ -567,19 +567,21 @@ static ssize_t adu_write(struct file *fi
 
 	retval = mutex_lock_interruptible(&dev->mtx);
 	if (retval)
-		goto exit_nolock;
+		goto exit_nolock_intr;
 
 	/* verify that the device wasn't unplugged */
 	if (dev->udev == NULL) {
+		mutex_unlock(&dev->mtx);
 		retval = -ENODEV;
 		err("No device or device unplugged %d", retval);
-		goto exit;
+		goto exit_nolock_intr;
 	}
 
 	/* verify that we actually have some data to write */
 	if (count == 0) {
+		mutex_unlock(&dev->mtx);
 		dbg(1," %s : write request of 0 bytes", __FUNCTION__);
-		goto exit;
+		goto exit_nolock_intr;
 	}
 
 	add_wait_queue(&dev->write_wait, &waita);
@@ -649,13 +651,14 @@ static ssize_t adu_write(struct file *fi
 			bytes_written += bytes_to_write;
 		}
 	}
-	remove_wait_queue(&dev->write_wait, &waita);
 
 	retval = bytes_written;
 
 exit:
 	mutex_unlock(&dev->mtx);
 exit_nolock:
+	remove_wait_queue(&dev->write_wait, &waita);
+exit_nolock_intr:
 
 	dbg(2," %s : leave, return value %d", __FUNCTION__, retval);
 
-

From: Greg KH
Date: Wednesday, October 24, 2007 - 2:25 pm

Hm, I think I'll wait for Pete and you to agree and send me a final
version before applying anything here :)

thanks,

greg k-h
-

From: Vitaliy Ivanov
Date: Friday, October 26, 2007 - 2:57 am

Greg,


I'm testing the final patch Pete created with all the notes.
Will let you know if there are issues with it. If not it can be
applied to your tree.

Thanks,
Vitaliy
-

Previous thread: [PATCH -v2 5/7] pull RT tasks by Steven Rostedt on Monday, October 22, 2007 - 7:59 pm. (1 message)

Next thread: [PATCH -v2 2/7] track highest prio queued on runqueue by Steven Rostedt on Monday, October 22, 2007 - 7:59 pm. (1 message)