Re: [build failure] Re: Linux 2.6.24-rc4 on S390x

Previous thread: none

Next thread: [PATCH v2] Fix hardware IRQ time accounting problem. by Tony Breeds on Monday, December 3, 2007 - 10:51 pm. (1 message)
From: Linus Torvalds
Date: Monday, December 3, 2007 - 10:08 pm

We should have one week between -rc releases, but I was gone for a week 
over thanksgiving (as were some other kernel developers), so this one is a 
bit late. It's been almost the rule rather than the exception, but I 
promise I'll be better...

Anyway, there aren't a lot of exciting changes here, but there's still a 
_lot_ more churn than I really hoped for at the -rc4 stage. Blackfin, MIPS 
and Power do stand out in the diffstats, but ARM and x86 got some updates 
too.

And we had some ACPI churn (processor throttling etc), along with various 
driver updates: ATA, IDE, infiniband, SCSI, USB and network drivers.. And 
on the filesystem side, cifs, NFS, ocfs2 and proc. Ugh. Too much.

In fact, the diff from -rc3 is almost 36,000 lines, and that's the smaller 
git one with the renames shown as renames (not the ones I upload as 
patches to kernel.org - those are done so that people with GNU patch and 
other legacy patch programs can use the diffs). I'll blame the two-week 
window for some of it, but even so, this is a bit disheartening. I'm 
really hoping that we're slowing down and -rc5 won't be anywhere near that 
large.

That said, none of the changes are really _exciting_ or really scary. And 
we should have fixed a number of regressions, although more certainly 
remain.

				Linus
--

From: Kamalesh Babulal
Date: Tuesday, December 4, 2007 - 3:23 am

Hi,

The patch ctc: make use of alloc_netdev() (commit 1c1478859017452a1179dbbdf7b9eb5b48438746)
introduces the build failure

  CC [M]  drivers/s390/net/fsm.o
  CC [M]  drivers/s390/net/smsgiucv.o
  CC [M]  drivers/s390/net/ctcmain.o
drivers/s390/net/ctcmain.c: In function `ctc_init_netdevice':
drivers/s390/net/ctcmain.c:2805: error: implicit declaration of function `SET_MODULE_OWNER'
make[2]: *** [drivers/s390/net/ctcmain.o] Error 1
make[1]: *** [drivers/s390/net] Error 2
make: *** [drivers/s390] Error 2

-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--

From: Martin Schwidefsky
Date: Tuesday, December 4, 2007 - 3:31 am

Hi Uschi,
that last patch reverted commit 10d024c1b2fd58af8362670d7d6e5ae52fc33353.
That needs to get readded.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.


--

From: Ingo Molnar
Date: Tuesday, December 4, 2007 - 3:32 am

the patch below should fix this.

	Ingo

------------>
Subject: drivers/s390/net/ctcmain.c: fix build bug
From: Ingo Molnar <mingo@elte.hu>

SET_MODULE_OWNER() is obsolete.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 drivers/s390/net/ctcmain.c |    1 -
 1 file changed, 1 deletion(-)

Index: linux/drivers/s390/net/ctcmain.c
===================================================================
--- linux.orig/drivers/s390/net/ctcmain.c
+++ linux/drivers/s390/net/ctcmain.c
@@ -2802,7 +2802,6 @@ void ctc_init_netdevice(struct net_devic
 	dev->type = ARPHRD_SLIP;
 	dev->tx_queue_len = 100;
 	dev->flags = IFF_POINTOPOINT | IFF_NOARP;
-	SET_MODULE_OWNER(dev);
 }
 
 
--

From: Nicolas Pitre
Date: Tuesday, December 4, 2007 - 6:22 am

Any reason for this:

 mode change 100644 => 100755 drivers/net/chelsio/cxgb2.c
 mode change 100644 => 100755 drivers/net/chelsio/pm3393.c
 mode change 100644 => 100755 drivers/net/chelsio/sge.c
 mode change 100644 => 100755 drivers/net/chelsio/sge.h


Nicolas
--

From: Jeff Garzik
Date: Tuesday, December 4, 2007 - 9:04 am

As repeatedly mentioned on the list :) it is a mistake.

	Jeff



--

From: Luiz Fernando N. Capitulino
Date: Tuesday, December 4, 2007 - 7:07 am

[Empty message]
From: Linus Torvalds
Date: Tuesday, December 4, 2007 - 8:56 am

Looks like we have a zero "cfs_rq->load.weight".

Ingo? Both sched_slice() and __sched_slice() do a divide by the runqueue 
weight, and at least dequeue_task_fair() explicitly checks for that being 
zero, so clearly zero is a possible value. Hmm?

		Linus
--

From: Luiz Fernando N. Capitulino
Date: Tuesday, December 4, 2007 - 9:04 am

Em Tue, 4 Dec 2007 17:00:05 +0100
Ingo Molnar <mingo@elte.hu> escreveu:

| 
| * Linus Torvalds <torvalds@linux-foundation.org> wrote:
| 
| > 
| > 
| > On Tue, 4 Dec 2007, Luiz Fernando N. Capitulino wrote:
| > >
| > > 	sched_rr_get_interval(1, NULL);
| > 
| > Looks like we have a zero "cfs_rq->load.weight".
| > 
| > Ingo? Both sched_slice() and __sched_slice() do a divide by the 
| > runqueue weight, and at least dequeue_task_fair() explicitly checks 
| > for that being zero, so clearly zero is a possible value. Hmm?
| 
| yeah, i can reproduce this crash too.
| 
| The problem is on SMP: if sched_rr_get_interval() gets a task from an 
| otherwise idle runqueue, then rq->load.weight is 0. Normally 
| sched_slice() is only used on a busy runqueue. So the correct fixup site 
| is not in sched_slice() but in sys_sched_rr_get_interval() - i'm working 
| on the right fix, i hope to be able to send a pull request in a few 
| minutes.

 Ingo, I can reproduce this w/o SMP support as well.

 (Also, the backtrace I sent was reproduced on a UP machine with a
SMP kernel).

-- 
Luiz Fernando N. Capitulino
--

From: Ingo Molnar
Date: Tuesday, December 4, 2007 - 9:08 am

hm, if you run this as an RT task, right? Or can you trigger it via pure 
SCHED_OTHER tasks as well? Below is my candidate fix.

	Ingo

--------------->
Subject: sched: fix crash in sys_sched_rr_get_interval()
From: Ingo Molnar <mingo@elte.hu>

Luiz Fernando N. Capitulino reported that sched_rr_get_interval()
crashes for SCHED_OTHER tasks that are on an idle runqueue.

The fix is to return a 0 timeslice for tasks that are on an idle
runqueue. (and which are not running, obviously)

Reported-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -4850,17 +4850,21 @@ long sys_sched_rr_get_interval(pid_t pid
 	if (retval)
 		goto out_unlock;
 
-	if (p->policy == SCHED_FIFO)
-		time_slice = 0;
-	else if (p->policy == SCHED_RR)
+	/*
+	 * Time slice is 0 for SCHED_FIFO tasks and for SCHED_OTHER
+	 * tasks that are on an otherwise idle runqueue:
+	 */
+	time_slice = 0;
+	if (p->policy == SCHED_RR) {
 		time_slice = DEF_TIMESLICE;
-	else {
+	} else {
 		struct sched_entity *se = &p->se;
 		unsigned long flags;
 		struct rq *rq;
 
 		rq = task_rq_lock(p, &flags);
-		time_slice = NS_TO_JIFFIES(sched_slice(cfs_rq_of(se), se));
+		if (rq->cfs.load.weight)
+			time_slice = NS_TO_JIFFIES(sched_slice(&rq->cfs, se));
 		task_rq_unlock(rq, &flags);
 	}
 	read_unlock(&tasklist_lock);
--

From: Ingo Molnar
Date: Tuesday, December 4, 2007 - 9:18 am

the problem is on UP too - if there are no SCHED_OTHER tasks. I've 
tested the fix and it solves the problem for various combinations of 
crash.c. I've updated sched.git, please pull it from:

   git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git

It has another commit besides this fix. Thanks,

	Ingo

------------------>

Ingo Molnar (2):
      sched: fix crash in sys_sched_rr_get_interval()
      sched: default to more agressive yield for SCHED_BATCH tasks

 sched.c      |   14 +++++++++-----
 sched_fair.c |    7 ++++---
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 59ff6b1..b062856 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4850,17 +4850,21 @@ long sys_sched_rr_get_interval(pid_t pid, struct timespec __user *interval)
 	if (retval)
 		goto out_unlock;
 
-	if (p->policy == SCHED_FIFO)
-		time_slice = 0;
-	else if (p->policy == SCHED_RR)
+	/*
+	 * Time slice is 0 for SCHED_FIFO tasks and for SCHED_OTHER
+	 * tasks that are on an otherwise idle runqueue:
+	 */
+	time_slice = 0;
+	if (p->policy == SCHED_RR) {
 		time_slice = DEF_TIMESLICE;
-	else {
+	} else {
 		struct sched_entity *se = &p->se;
 		unsigned long flags;
 		struct rq *rq;
 
 		rq = task_rq_lock(p, &flags);
-		time_slice = NS_TO_JIFFIES(sched_slice(cfs_rq_of(se), se));
+		if (rq->cfs.load.weight)
+			time_slice = NS_TO_JIFFIES(sched_slice(&rq->cfs, se));
 		task_rq_unlock(rq, &flags);
 	}
 	read_unlock(&tasklist_lock);
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 37bb265..c33f0ce 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -799,8 +799,9 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int sleep)
  */
 static void yield_task_fair(struct rq *rq)
 {
-	struct cfs_rq *cfs_rq = task_cfs_rq(rq->curr);
-	struct sched_entity *rightmost, *se = &rq->curr->se;
+	struct task_struct *curr = rq->curr;
+	struct cfs_rq *cfs_rq = task_cfs_rq(curr);
+	struct ...
From: Luiz Fernando N. Capitulino
Date: Tuesday, December 4, 2007 - 9:40 am

Em Tue, 4 Dec 2007 17:18:27 +0100
Ingo Molnar <mingo@elte.hu> escreveu:

| 
| * Ingo Molnar <mingo@elte.hu> wrote:
| 
| > The problem is on SMP: if sched_rr_get_interval() gets a task from an 
| > otherwise idle runqueue, then rq->load.weight is 0. Normally 
| > sched_slice() is only used on a busy runqueue. So the correct fixup 
| > site is not in sched_slice() but in sys_sched_rr_get_interval() - i'm 
| > working on the right fix, i hope to be able to send a pull request in 
| > a few minutes.
| 
| the problem is on UP too - if there are no SCHED_OTHER tasks. I've 
| tested the fix and it solves the problem for various combinations of 
| crash.c. I've updated sched.git, please pull it from:
| 
|    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git
| 
| It has another commit besides this fix. Thanks,

 Yes, I tested the 'sched: fix crash in sys_sched_rr_get_interval()'
one and it really fixes the problem.

 Thanks a lot Ingo.

-- 
Luiz Fernando N. Capitulino
--

From: Greg KH
Date: Tuesday, December 4, 2007 - 11:28 am

Can you make up something that I can apply for 2.6.23-stable?  or is
this not an issue on that tree?

thanks,

greg k-h
--

From: Luiz Fernando N. Capitulino
Date: Tuesday, December 4, 2007 - 11:41 am

Em Tue, 4 Dec 2007 10:28:51 -0800
Greg KH <greg@kroah.com> escreveu:

| On Tue, Dec 04, 2007 at 05:18:27PM +0100, Ingo Molnar wrote:
| > 
| > * Ingo Molnar <mingo@elte.hu> wrote:
| > 
| > > The problem is on SMP: if sched_rr_get_interval() gets a task from an 
| > > otherwise idle runqueue, then rq->load.weight is 0. Normally 
| > > sched_slice() is only used on a busy runqueue. So the correct fixup 
| > > site is not in sched_slice() but in sys_sched_rr_get_interval() - i'm 
| > > working on the right fix, i hope to be able to send a pull request in 
| > > a few minutes.
| > 
| > the problem is on UP too - if there are no SCHED_OTHER tasks. I've 
| > tested the fix and it solves the problem for various combinations of 
| > crash.c. I've updated sched.git, please pull it from:
| > 
| >    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git
| > 
| > It has another commit besides this fix. Thanks,
| 
| Can you make up something that I can apply for 2.6.23-stable?  or is
| this not an issue on that tree?

 FWIW I couldn't reproduce the problem with 2.6.23.9. sched_slice()
is quite different on that kernel and _maybe_ it won't never divide
by zero.

 My original report on vendor-sec was wrong. I've said that 2.6.23.9
had the same bug but turns out the kernel I tested had the Ingo's
CFS backport patch applied. I didn't know that, I thought it was a
vanilla kernel.

 Btw, I think it's important to release a new CFS backport patch
because maybe some distro is using it (Mandriva stable kernel is
using the CFS backport patch, but we didn't update to latest
version yet).

-- 
Luiz Fernando N. Capitulino
--

From: Diego Calleja
Date: Tuesday, December 4, 2007 - 5:23 pm

As usually, if someone finds errors in http://kernelnewbies.org/Linux_2_6_24 ,
let me know it or change it yourself.
--

Previous thread: none

Next thread: [PATCH v2] Fix hardware IRQ time accounting problem. by Tony Breeds on Monday, December 3, 2007 - 10:51 pm. (1 message)