[PATCH] Return value from schedule()

Previous thread: none

Next thread: [PATCH 3/3] sky2: use pci_read_vpd to read info during boot by Stephen Hemminger on Wednesday, September 3, 2008 - 4:00 pm. (4 messages)
From: Stephen Hemminger
Date: Wednesday, September 3, 2008 - 3:57 pm

Accessing the VPD area can take a long time. There are comments in the
SysKonnect vendor driver that it can take up to 25ms.  The existing vpd
access code fails consistently on my hardware.

Change the access routines to:
  * use a mutex rather than spinning with IRQ's disabled and lock held
  * have a longer timeout
  * call schedule while spinning to provide some responsivness

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


--- a/drivers/pci/access.c	2008-09-02 10:42:12.000000000 -0700
+++ b/drivers/pci/access.c	2008-09-03 08:47:49.000000000 -0700
@@ -133,7 +133,7 @@ PCI_USER_WRITE_CONFIG(dword, u32)
 
 struct pci_vpd_pci22 {
 	struct pci_vpd base;
-	spinlock_t lock; /* controls access to hardware and the flags */
+	struct mutex lock;
 	u8	cap;
 	bool	busy;
 	bool	flag; /* value of F bit to wait for */
@@ -144,29 +144,30 @@ static int pci_vpd_pci22_wait(struct pci
 {
 	struct pci_vpd_pci22 *vpd =
 		container_of(dev->vpd, struct pci_vpd_pci22, base);
-	u16 flag, status;
-	int wait;
+	u16 flag = vpd->flag ? PCI_VPD_ADDR_F : 0;
+	unsigned long timeout = jiffies + (vpd->flag ? HZ/50 : HZ/10);
+	u16 status;
 	int ret;
 
 	if (!vpd->busy)
 		return 0;
 
-	flag = vpd->flag ? PCI_VPD_ADDR_F : 0;
-	wait = vpd->flag ? 10 : 1000; /* read: 100 us; write: 10 ms */
-	for (;;) {
-		ret = pci_user_read_config_word(dev,
-						vpd->cap + PCI_VPD_ADDR,
-						&status);
-		if (ret < 0)
-			return ret;
+	while ( (ret = pci_user_read_config_word(dev,
+						 vpd->cap + PCI_VPD_ADDR,
+						 &status)) == 0) {
 		if ((status & PCI_VPD_ADDR_F) == flag) {
 			vpd->busy = false;
-			return 0;
+			break;
 		}
-		if (wait-- == 0)
+
+		if (time_after(jiffies, timeout))
 			return -ETIMEDOUT;
-		udelay(10);
+		if (signal_pending(current))
+			return -EINTR;
+		schedule();
 	}
+
+	return ret;
 }
 
 static int pci_vpd_pci22_read(struct pci_dev *dev, int pos, int size,
@@ -183,7 +184,7 @@ static int pci_vpd_pci22_read(struct pci
 	if (size == 0)
 		return ...
From: Matthew Wilcox
Date: Thursday, September 4, 2008 - 5:52 am

Wow, that's slow.  If you were to try to read all 32k, it'd take more


This should be:

+	if (mutex_lock_interruptible(&vpd->lock))

And the same here, of course.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Ben Hutchings
Date: Thursday, September 4, 2008 - 7:19 am

[...]

This is fine for the sysfs case, but not if this is called during device
probe - we don't want signals to modprobe to break device initialisation,
do we?

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
--

From: Matthew Wilcox
Date: Thursday, September 4, 2008 - 9:10 am

Probably only fatal signals -- in which case the if (signal_pending)
check should be a fatal_signal_pending() and mutex_lock_interruptible()
should be mutex_lock_killable().

OTOH, who's signalling modprobe to do anything other than die?

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Stephen Hemminger
Date: Thursday, September 4, 2008 - 9:32 am

On Thu, 4 Sep 2008 15:19:46 +0100

It's bad but not that bad more details are:

				MIN		MAX
 -------------------------------------------------------------------
 write				1.8 ms		3.6 ms
 internal write cyles		0.7 ms		7.0 ms
 -------------------------------------------------------------------
 over all program time	 	2.5 ms		10.6 ms
 read				1.3 ms		2.6 ms
 -------------------------------------------------------------------
 over all 			3.8 ms		13.2 ms


Usable VPD is limited to 2K so worst case read is 27 seconds.
Note: there doesn't appear to be an standard for VPD size register in

Why not, it makes sense to allow killing a stuck modprobe.
--

From: Matthew Wilcox
Date: Thursday, September 4, 2008 - 9:07 am

In some circumstances, you want to wait for an event to happen.  let's
assume that it's a hardware event, so you can't just add a notifier of
some kind, you have to poll.  Here's an example:


If there's no other task ready to run, schedule() could return in much
less than 10 microseconds (actually, it could return in much less than
10 microseconds even if another task does run, but let's ignore that case).

If schedule() returned whether or not it had scheduled another task, we
could do something like:

		if (!schedule())
			udelay(10);

Please consider this patch:

It can be useful to know whether a call to schedule() ran another task
or not.  For example, if you're giving up the CPU while waiting for an
event to happen, you might want to delay if no other process wants the
CPU to ensure you're not just bashing away at a device that is unlikely
to have finished yet.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5270d44..39c6ef9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -329,7 +329,7 @@ extern signed long schedule_timeout(signed long timeout);
 extern signed long schedule_timeout_interruptible(signed long timeout);
 extern signed long schedule_timeout_killable(signed long timeout);
 extern signed long schedule_timeout_uninterruptible(signed long timeout);
-asmlinkage void schedule(void);
+asmlinkage int schedule(void);
 
 struct nsproxy;
 struct user_namespace;
diff --git a/kernel/sched.c b/kernel/sched.c
index 04160d2..ba3ab9a 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4337,8 +4337,9 @@ pick_next_task(struct rq *rq, struct task_struct *prev)
 
 /*
  * schedule() is the main scheduler function.
+ * It returns 1 if we scheduled a new task and 0 if no other task was chosen.
  */
-asmlinkage void __sched schedule(void)
+asmlinkage int __sched schedule(void)
 {
 	struct task_struct *prev, *next;
 	unsigned long *switch_count;
@@ -4411,6 +4412,7 @@ ...
From: Ingo Molnar
Date: Thursday, September 4, 2008 - 9:14 am

hm, i'm not really sure - this really just seems to be a higher prio 
variant of yield() combined with some weird code. Do we really want to 
promote such arguably broken behavior? If there's any chance of any 
polling to take a material amount of CPU time it should be event driven 
to begin with.

	Ingo
--

From: Matthew Wilcox
Date: Thursday, September 4, 2008 - 9:21 am

Oh, I'm not concerned about CPU utilisation, I'm concerned about PCI bus
utilisation.  Perhaps I'd like a yield_timeout() function instead where
I say that I'd like to not run for at least 10 microseconds?

Can we do that, or are we still jiffie-based there?

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Arjan van de Ven
Date: Thursday, September 4, 2008 - 10:30 am

On Thu, 4 Sep 2008 10:21:11 -0600

use schedule_hrtimerout() for this (hopefully will be in 2.6.28);
see this weeks LWN for an article describing it
--

From: Matthew Wilcox
Date: Thursday, September 4, 2008 - 10:48 am

OK, so something like:

	struct timespec ts = { 0, 10 * 1000 };

	set_task_state(TASK_INTERRUPTIBLE);
	schedule_hrtimeout(&ts, HRTIMER_MODE_REL);
	if (fatal_signal_pending())
		return -EINTR;

should do the trick.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Stephen Hemminger
Date: Thursday, September 4, 2008 - 12:05 pm

On Thu, 4 Sep 2008 11:48:45 -0600

Never mind, I changed it to just yield() in revision.
--

From: Peter Zijlstra
Date: Friday, September 5, 2008 - 12:40 am

Gah, not another yield in the network code we have to figure out wtf its
meant to do.



--

Previous thread: none

Next thread: [PATCH 3/3] sky2: use pci_read_vpd to read info during boot by Stephen Hemminger on Wednesday, September 3, 2008 - 4:00 pm. (4 messages)