ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc6...
2.6.23-rc6-mm1 is a 29MB diff against 2.6.23-rc6.
It took me over two solid days to get this lot compiling and booting on a few
boxes. This required around ninety fixup patches and patch droppings. There
are several bugs in here which I know of (details below) and presumably many
more which I don't know of. I have to say that this just isn't working any
more.- The Vaio hangs when quitting X due to x86_64-mm-cpa-clflush.patch, but
I didn't drop that patch because the iommu patch series depends on it.- The Vaio also hangs during resume-from-RAM, due to git-acpi.patch
- And it hangs during suspend-to-RAM, due to git-acpi.patch
- Huge churn in networking. Many wireless drivers will be busted, and b44.c
has been disabled altogether. If you need b44, then good luck working out
what was done to it.- Added Peter's writeback balancing changes. Might help people who are
seeing large latency during heavy writeback.- Major changes in the basically-unmaintained IPC code. Needs careful
testing and reviewing.- Matthias fixed the mm-to-git importer, so the below-referenced git tree
should hopefully be working again.- Pleeeeeze do try to cc the appropriate developer(s) and mailing lists
when reporting problems, thanks.Boilerplate:
- See the `hot-fixes' directory for any important updates to this patchset.
- To fetch an -mm tree using git, use (for example)
git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1
git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1- -mm kernel commit activity can be reviewed by subscribing to the
mm-commits mailing list.echo "subscribe mm-commits" | mail majordomo@vger.kernel.org
- If you hit a bug in -mm and it is not obvious which patch caused it, it is
most valuable if you can perform a bisection search to identify which patch
introduced the bug. In...
This patch does the following things:
- Make hidp_setup_input() return int to indicate errors.
- Check its return value to handle errors.Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
---
net/bluetooth/hidp/core.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)Index: linux-2.6.23-rc6-mm1/net/bluetooth/hidp/core.c
===================================================================
--- linux-2.6.23-rc6-mm1.orig/net/bluetooth/hidp/core.c
+++ linux-2.6.23-rc6-mm1/net/bluetooth/hidp/core.c
@@ -625,7 +625,7 @@ static struct device *hidp_get_device(st
return conn ? &conn->dev : NULL;
}-static inline void hidp_setup_input(struct hidp_session *session, struct hidp_connadd_req *req)
+static inline int hidp_setup_input(struct hidp_session *session, struct hidp_connadd_req *req)
{
struct input_dev *input = session->input;
int i;
@@ -669,7 +669,7 @@ static inline void hidp_setup_input(struinput->event = hidp_input_event;
- input_register_device(input);
+ return input_register_device(input);
}static int hidp_open(struct hid_device *hid)
@@ -823,7 +823,8 @@ int hidp_add_connection(struct hidp_conn
session->idle_to = req->idle_to;if (session->input)
- hidp_setup_input(session, req);
+ if ((err = (hidp_setup_input(session, req))))
+ goto failed;if (session->hid)
hidp_setup_hid(session, req);
-
This is confusing, why not just do
if (session->input) {
err = hidp_setup_input(session, req);
if (err)
goto failed;-
lets use "if (err < 0)" and I am okay with that patch.
Regards
Marcel
-
This patch does the following things:
- Make hidp_setup_input() return int to indicate errors.
- Check its return value to handle errors.Thanks to roel for comments.
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
---
net/bluetooth/hidp/core.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)Index: linux-2.6.23-rc6-mm1/net/bluetooth/hidp/core.c
===================================================================
--- linux-2.6.23-rc6-mm1.orig/net/bluetooth/hidp/core.c
+++ linux-2.6.23-rc6-mm1/net/bluetooth/hidp/core.c
@@ -625,7 +625,7 @@ static struct device *hidp_get_device(st
return conn ? &conn->dev : NULL;
}-static inline void hidp_setup_input(struct hidp_session *session, struct hidp_connadd_req *req)
+static inline int hidp_setup_input(struct hidp_session *session, struct hidp_connadd_req *req)
{
struct input_dev *input = session->input;
int i;
@@ -669,7 +669,7 @@ static inline void hidp_setup_input(struinput->event = hidp_input_event;
- input_register_device(input);
+ return input_register_device(input);
}static int hidp_open(struct hid_device *hid)
@@ -822,8 +822,11 @@ int hidp_add_connection(struct hidp_conn
session->flags = req->flags & (1 << HIDP_BLUETOOTH_VENDOR_ID);
session->idle_to = req->idle_to;- if (session->input)
- hidp_setup_input(session, req);
+ if (session->input) {
+ err = hidp_setup_input(session, req);
+ if (err)
+ goto failed;
+ }if (session->hid)
hidp_setup_hid(session, req);
-
Yes, you are right. Thanks. I will resend this patch. ;)
--
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."-
This bug appears in 2.6.23-rc3-mm1, too.
The message:
[ 3267.844826] NMI Watchdog detected LOCKUP on CPU 0
[ 3267.849515] CPU 0
[ 3267.851525] Modules linked in: binfmt_misc ipt_MASQUERADE iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables x_tables nf_nat_tftp nf_nat_ftp nf_nat nf_conntrack_tftp nf_conntrack_ftp nf_conntrack nfnetlink fan ac battery ipv6 eeprom lm85 hwmon_vid i2c_core tun fuse kvm snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd sg soundcore snd_page_alloc thermal sr_mod pcspkr evdev button processor cdrom
[ 3267.889547] Pid: 13507, comm: gcc Not tainted 2.6.23-rc6-mm1 #4
[ 3267.895442] RIP: 0033:[<00002ab84e34cd44>] [<00002ab84e34cd44>]
[ 3267.901438] RSP: 002b:00007fff5c9e03f8 EFLAGS: 00000287
[ 3267.906726] RAX: 0000000000000000 RBX: 00007fff5c9e0580 RCX: 0000000000000000
[ 3267.913833] RDX: 0000000000000013 RSI: 00007fff5c9e0680 RDI: 00000000012a7010
[ 3267.920939] RBP: 00007fff5c9e0550 R08: 0000000000000050 R09: 0000000000000000
[ 3267.928045] R10: 0000000000000000 R11: 00000000012a7410 R12: 0000000000000002
[ 3267.935151] R13: 0000000000000003 R14: 0000000000000005 R15: 000000000000001f
[ 3267.942258] FS: 00002ab84f144170(0000) GS:ffffffff814f3000(0000) knlGS:0000000000000000
[ 3267.950317] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3267.956038] CR2: 00002ab84e3a7430 CR3: 000000000d618000 CR4: 00000000000006e0
[ 3267.963144] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3267.970250] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3267.977357] Process gcc (pid: 13507, threadinfo ffff81000ebe6000, task ffff810008b849d0)
[ 3267.985416]
[ 3267.997480] Unable to handle kernel paging request at 00000000fffffffe RIP:
[ 3268.002082] [<00000000fffffffe>]
[ 3268.007827] PGD ea85067 PUD 0
[ 3268.010887] Oops: 0010 [1] SMP
[ 3268.014035] last sysfs file: /devices/pci0000:00/0000:00:1e.0/0000:05:04.0/resource
[ 3268.021662] CPU 0
[ 3268.023674] Modules linked i...
Looks like it oopsed in the middle of handling an NMI watchdog expiry,
That's interensting. serial_in(). We have had NMI watchdog expiries when
the kernel is printing a large amount of stuff out a slow serial port with
interrutps disabled. But I thought we'd pretty much plugged those problems
by sprinkling touch_nmi_watchdog() in various places.Do you think this is what was happening on your system?
-
Very likely. I'm running linux with cmdline
"root=/dev/sda1 ro nmi_watchdog=1 console=ttyS0,115200 console=tty0",
and doing a lot of printks at the time ;-)Thank you,
Fengguang-
OK. We need to find a suitable place to poke yet another
touch_nmi_watchdog(). Maybe we should give up and put one in
printk().And you oopsed for different reasons in the nmi-watchdog handling
code too. I think I'll pretend I didn't see that.-
Let's forget it for now.
I can try Ingo's latency tracing patches at some convenient time.-
drivers/net/iseries_veth.c: In function 'veth_transmit_to_many':
drivers/net/iseries_veth.c:1174: warning: unused variable 'port'Signed-off-by: Satyam Sharma <satyam@infradead.org>
---
drivers/net/iseries_veth.c | 1 -
1 file changed, 1 deletion(-)diff -ruNp a/drivers/net/iseries_veth.c b/drivers/net/iseries_veth.c
--- a/drivers/net/iseries_veth.c 2007-09-22 06:26:39.000000000 +0530
+++ b/drivers/net/iseries_veth.c 2007-09-22 11:17:29.000000000 +0530
@@ -1171,7 +1171,6 @@ static void veth_transmit_to_many(struct
HvLpIndexMap lpmask,
struct net_device *dev)
{
- struct veth_port *port = (struct veth_port *) dev->priv;
int i, success, error;success = error = 0;
-
Of ethtool_ops->get_stats_count and ethtool_ops->get_ethtool_stats.
Signed-off-by: Satyam Sharma <satyam@infradead.org>
---
drivers/net/mv643xx_eth.c | 2 --
1 file changed, 2 deletions(-)diff -ruNp a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
--- a/drivers/net/mv643xx_eth.c 2007-09-22 06:26:39.000000000 +0530
+++ b/drivers/net/mv643xx_eth.c 2007-09-22 13:15:41.000000000 +0530
@@ -2740,8 +2740,6 @@ static const struct ethtool_ops mv643xx_
.get_stats_count = mv643xx_get_stats_count,
.get_ethtool_stats = mv643xx_get_ethtool_stats,
.get_strings = mv643xx_get_strings,
- .get_stats_count = mv643xx_get_stats_count,
- .get_ethtool_stats = mv643xx_get_ethtool_stats,
.nway_reset = mv643xx_eth_nway_restart,
};-
The netfilter sysctls in the bridging code don't set strategy routines:
sysctl table check failed: /net/bridge/bridge-nf-call-arptables .3.10.1 Missing strategy
sysctl table check failed: /net/bridge/bridge-nf-call-iptables .3.10.2 Missing strategy
sysctl table check failed: /net/bridge/bridge-nf-call-ip6tables .3.10.3 Missing strategy
sysctl table check failed: /net/bridge/bridge-nf-filter-vlan-tagged .3.10.4 Missing strategy
sysctl table check failed: /net/bridge/bridge-nf-filter-pppoe-tagged .3.10.5 Missing strategyThese binary sysctls can't work. The binary sysctl numbers of
other netfilter sysctls with this problem are being removed. These
need to go as well.Signed-off-by: Joseph Fannin <jfannin@gmail.com>
---
This *really* needs to be reviewed by someone who knows what this
is all about. I've simply extended the removal of netfilter binary
sysctl numbers so I could load bridge.ko. I don't particularly
care if I get attributed for this fix or any of that.It Works For Me.
diff -ru linux-2.6.23-rc6-mm1.orig/net/bridge/br_netfilter.c linux-2.6.23-rc6-mm1/net/bridge/br_netfilter.c
--- linux-2.6.23-rc6-mm1.orig/net/bridge/br_netfilter.c 2007-09-19 02:40:49.000000000 -0400
+++ linux-2.6.23-rc6-mm1/net/bridge/br_netfilter.c 2007-09-20 20:31:41.000000000 -0400
@@ -904,7 +904,6 @@static ctl_table brnf_table[] = {
{
- .ctl_name = NET_BRIDGE_NF_CALL_ARPTABLES,
.procname = "bridge-nf-call-arptables",
.data = &brnf_call_arptables,
.maxlen = sizeof(int),
@@ -912,7 +911,6 @@
.proc_handler = &brnf_sysctl_call_tables,
},
{
- .ctl_name = NET_BRIDGE_NF_CALL_IPTABLES,
.procname = "bridge-nf-call-iptables",
.data = &brnf_call_iptables,
.maxlen = sizeof(int),
@@ -920,7 +918,6 @@
.proc_handler = &brnf_sysctl_call_tables,
},
{
- .ctl_name = NET_BRIDGE_NF_CALL_IP6TABLES,
.procname = "bridge-nf-call-ip6tables",
.data = &brnf_call_ip6tables,
.maxlen = sizeof(int),
@@ -...
Hmm. This is an interesting case. The proc method is forcing
the integer to be either 0 or 1 in a racy fashion. But none of the
users appear to depend upon that.So this is the least broken set of binary sysctls I have seen caught
by my check.A really good fix would be to remove the binary side and then to
modify brnf_sysctl_call_tables to allocate a temporary ctl_table and
integer on the stack and only set ctl->data after we have normalized
the written value. But since in practice nothing cares about
the race a better fix probably isn't worth it.Eric
-
I seem to be missing something, the entire brnf_sysctl_call_tables
thing looks purely cosmetic to me, wouldn't it be better to simply
remove it?-
Well it is cosmetic in a user space visible way. Which means I don't
have a clue which if any user space programs or scripts care if we change
the behavior.I just looked in the git history and brnf_sysctl_call_tables has been
that way since sysctl support was added to the bridge netfilter code.The only comment I can found about the addition is:
2003/12/24 19:32:34-08:00 bdschuym
[BRIDGE]: Add 4 sysctl entries for bridge netfilter behavioral control:
bridge-nf-call-arptables - pass or don't pass bridged ARP traffic to
arptables' FORWARD chain.
bridge-nf-call-iptables - pass or don't pass bridged IPv4 traffic to
iptables' chains.
bridge-nf-filter-vlan-tagged - pass or don't pass bridged vlan-tagged
ARP/IP traffic to arptables/iptables.So since forcing the values to 0 or 1 doesn't seem hard to maintain
I am uncomfortable with removing that check.Eric
-
OK lets keep it then. Fixing the race seems overkill to me though.
-
Me to.
Eric
-
On Mon, 24 Sep 2007 18:55:38 +0200
I agree, removing seems like a better option. But probably need to go
through a 3-6mo warning period, since sysctl's are technically an API.--
Stephen Hemminger <shemminger@linux-foundation.org>-
I meant removing brnf_sysctl_call_tables function, not the sysctls
themselves, all it does is change values != 0 to 1. Or did you
actually mean that something in userspace might depend on reading
back the value 1 after writing a value != 0?
-
On Tue, 25 Sep 2007 06:07:24 +0200
I was going farther, because don't really see the value of having
a sysctl for this. It seems better to just not load filters if
they aren't going to be used. Having another enable/disable hook
just adds needless complexity.
-
These sysctls control whether bridged packets will be handled
by iptables and friends. The bridge netfilter code always
handles bridged packets, and iptables might be loaded for
different reasons. So I don't see how that would work.I think it should be specified in the ebtables ruleset, but
the current netfilter infrastructure doesn't allow to do that
cleanly.-
allmodconfig on ppc64 fails to build with the following error
drivers/block/ps3disk.c: In function `ps3disk_probe':
drivers/block/ps3disk.c:509: error: implicit declaration of function `blk_queue_issue_flush_fn'
make[2]: *** [drivers/block/ps3disk.o] Error 1
make[1]: *** [drivers/block] Error 2
make: *** [drivers] Error 2The problem seems to be coming from git-block.patch. Jens, glancing through
the patch, the function blk_queue_issue_flush_fn() seems to be have been
made redundant. Based on that, this looks like the correct fix but it needs
a review. ThanksSigned-off-by: Mel Gorman <mel@csn.ul.ie>
---
drivers/block/ps3disk.c | 21 ---------------------
1 file changed, 21 deletions(-)diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc6-mm1-030_fix_ppc64_sata/drivers/block/ps3disk.c linux-2.6.23-rc6-mm1-035_fix_ppc64_ps3disk/drivers/block/ps3disk.c
--- linux-2.6.23-rc6-mm1-030_fix_ppc64_sata/drivers/block/ps3disk.c 2007-09-11 03:50:29.000000000 +0100
+++ linux-2.6.23-rc6-mm1-035_fix_ppc64_ps3disk/drivers/block/ps3disk.c 2007-09-20 14:17:43.000000000 +0100
@@ -414,26 +414,6 @@ static void ps3disk_prepare_flush(struct
req->cmd_type = REQ_TYPE_FLUSH;
}-static int ps3disk_issue_flush(struct request_queue *q, struct gendisk *gendisk,
- sector_t *sector)
-{
- struct ps3_storage_device *dev = q->queuedata;
- struct request *req;
- int res;
-
- dev_dbg(&dev->sbd.core, "%s:%u\n", __func__, __LINE__);
-
- req = blk_get_request(q, WRITE, __GFP_WAIT);
- ps3disk_prepare_flush(q, req);
- res = blk_execute_rq(q, gendisk, req, 0);
- if (res)
- dev_err(&dev->sbd.core, "%s:%u: flush request failed %d\n",
- __func__, __LINE__, res);
- blk_put_request(req);
- return res;
-}
-
-
static unsigned long ps3disk_mask;static DEFINE_MUTEX(ps3disk_mask_mutex);
@@ -506,7 +486,6 @@ static int __devinit ps3disk_probe(struc
blk_queue_dma_alignment(queue, dev->blk_size-1);
blk_queue_hardsect_size(queue, dev->blk_size...
BTW ppc64_defconfig didn't quite like 2.6.23-rc6-mm1 either ...
IIRC I got build failures in:drivers/ata/pata_scc.c
drivers/md/raid6int8.c
drivers/net/spider_net.c
drivers/net/pasemi_mac.c
drivers/pci/hotplug/rpadlpar_sysfs.cI was in a hurry so didn't record the errors, just noted down the files
and disabled them from .config and continued building ... I'll get back
to fixing these up tonight (if you don't do that by then already).Satyam
-
[PATCH -mm] pasemi_mac: Build fix after recent netdev stats changes
Unbreak the following:
drivers/net/pasemi_mac.c: In function 'pasemi_mac_clean_rx':
drivers/net/pasemi_mac.c:533: error: 'dev' undeclared (first use in this function)
drivers/net/pasemi_mac.c:533: error: (Each undeclared identifier is reported only once
drivers/net/pasemi_mac.c:533: error: for each function it appears in.)And remove an unused static function:
drivers/net/pasemi_mac.c: At top level:
drivers/net/pasemi_mac.c:89: warning: 'read_iob_reg' defined but not usedSigned-off-by: Satyam Sharma <satyam@infradead.org>
---
drivers/net/pasemi_mac.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)diff -ruNp a/drivers/net/pasemi_mac.c b/drivers/net/pasemi_mac.c
--- a/drivers/net/pasemi_mac.c 2007-09-22 06:26:39.000000000 +0530
+++ b/drivers/net/pasemi_mac.c 2007-09-22 13:03:04.000000000 +0530
@@ -85,11 +85,6 @@ MODULE_PARM_DESC(debug, "PA Semi MAC bitstatic struct pasdma_status *dma_status;
-static unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned int reg)
-{
- return in_le32(mac->iob_regs+reg);
-}
-
static void write_iob_reg(struct pasemi_mac *mac, unsigned int reg,
unsigned int val)
{
@@ -530,8 +525,8 @@ static int pasemi_mac_clean_rx(struct pa
} else
skb->ip_summed = CHECKSUM_NONE;- dev->stats.rx_bytes += len;
- dev->stats.rx_packets++;
+ mac->netdev->stats.rx_bytes += len;
+ mac->netdev->stats.rx_packets++;skb->protocol = eth_type_trans(skb, mac->netdev);
netif_receive_skb(skb);
-
Fixing the above showed up another problem in another file of the
same driver (drivers/net/spider_net_ethtool.c)[PATCH -mm] spider_net_ethtool: Keep up with recent netdev stats changes
Unbreak the following:
drivers/net/spider_net_ethtool.c: In function 'spider_net_get_ethtool_stats':
drivers/net/spider_net_ethtool.c:160: error: structure has no member named 'netdev_stats'
drivers/net/spider_net_ethtool.c:161: error: structure has no member named 'netdev_stats'
drivers/net/spider_net_ethtool.c:162: error: structure has no member named 'netdev_stats'
drivers/net/spider_net_ethtool.c:163: error: structure has no member named 'netdev_stats'
drivers/net/spider_net_ethtool.c:164: error: structure has no member named 'netdev_stats'
drivers/net/spider_net_ethtool.c:165: error: structure has no member named 'netdev_stats'
drivers/net/spider_net_ethtool.c:166: error: structure has no member named 'netdev_stats'
make[2]: *** [drivers/net/spider_net_ethtool.o] Error 1Also do another ARRAY_SIZE() cleanup while at it.
Signed-off-by: Satyam Sharma <satyam@infradead.org>
---
drivers/net/spider_net_ethtool.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)diff -ruNp a/drivers/net/spider_net_ethtool.c b/drivers/net/spider_net_ethtool.c
--- a/drivers/net/spider_net_ethtool.c 2007-09-22 06:26:39.000000000 +0530
+++ b/drivers/net/spider_net_ethtool.c 2007-09-22 12:43:51.000000000 +0530
@@ -28,8 +28,6 @@
#include "spider_net.h"-#define SPIDER_NET_NUM_STATS 13
-
static struct {
const char str[ETH_GSTRING_LEN];
} ethtool_stats_keys[] = {
@@ -149,7 +147,7 @@ spider_net_ethtool_get_ringparam(structstatic int spider_net_get_stats_count(struct net_device *netdev)
{
- return SPIDER_NET_NUM_STATS;
+ return ARRAY_SIZE(ethtool_stats_keys);
}static void spider_net_get_ethtool_stats(struct net_device *netdev,
@@ -157,13 +155,13 @@ static void spider_net_get_ethtool_stats
{
struct spider_net_card *card = netdev->priv;
...
[PATCH -mm] spider_net: Misc build fixes after recent netdev stats changes
Unbreak the following:
drivers/net/spider_net.c: In function 'spider_net_release_tx_chain':
drivers/net/spider_net.c:818: error: 'dev' undeclared (first use in this function)
drivers/net/spider_net.c:818: error: (Each undeclared identifier is reported only once
drivers/net/spider_net.c:818: error: for each function it appears in.)
drivers/net/spider_net.c: In function 'spider_net_xmit':
drivers/net/spider_net.c:922: error: 'dev' undeclared (first use in this function)
drivers/net/spider_net.c: In function 'spider_net_pass_skb_up':
drivers/net/spider_net.c:1018: error: 'dev' undeclared (first use in this function)
drivers/net/spider_net.c: In function 'spider_net_decode_one_descr':
drivers/net/spider_net.c:1215: error: 'dev' undeclared (first use in this function)
make[2]: *** [drivers/net/spider_net.o] Error 1Signed-off-by: Satyam Sharma <satyam@infradead.org>
---
drivers/net/spider_net.c | 24 +++++++++++-------------
1 file changed, 11 insertions(+), 13 deletions(-)diff -ruNp a/drivers/net/spider_net.c b/drivers/net/spider_net.c
--- a/drivers/net/spider_net.c 2007-09-22 06:26:39.000000000 +0530
+++ b/drivers/net/spider_net.c 2007-09-22 12:12:23.000000000 +0530
@@ -795,6 +795,7 @@ spider_net_set_low_watermark(struct spid
static int
spider_net_release_tx_chain(struct spider_net_card *card, int brutal)
{
+ struct net_device *dev = card->netdev;
struct spider_net_descr_chain *chain = &card->tx_chain;
struct spider_net_descr *descr;
struct spider_net_hw_descr *hwdescr;
@@ -919,7 +920,7 @@ spider_net_xmit(struct sk_buff *skb, str
spider_net_release_tx_chain(card, 0);if (spider_net_prepare_tx_descr(card, skb) != 0) {
- dev->stats.tx_dropped++;
+ netdev->stats.tx_dropped++;
netif_stop_queue(netdev);
return NETDEV_TX_BUSY;
}
@@ -979,16 +980,12 @@ static void
spider_net_pass_skb_up(struct spider_net_descr *descr,
struct spider...
I've confirmed that this patch fixes the build error in question.
--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
This turned out to be a gcc bug -- I was using an old cross-compiler.
-
Thanks! The patch is correct, the prepare_flush() hook is now all that
is needed. I will apply this to my barrier branch, where this originates
from.--
Jens Axboe-
PPC64 building allmodconfig fails to compile drivers/ata/pata_scc.c . It
doesn't show up on other arches because this driver is specific to the
architecture.drivers/ata/pata_scc.c: In function `scc_bmdma_status':
drivers/ata/pata_scc.c:734: error: structure has no member named `active_tag'
drivers/ata/pata_scc.c: In function `scc_pata_prereset':
drivers/ata/pata_scc.c:866: warning: passing arg 1 of `ata_std_prereset' from incompatible pointer type
drivers/ata/pata_scc.c: In function `scc_error_handler':
drivers/ata/pata_scc.c:908: warning: passing arg 2 of `ata_bmdma_drive_eh' from incompatible pointer type
drivers/ata/pata_scc.c:908: warning: passing arg 3 of `ata_bmdma_drive_eh' from incompatible pointer type
drivers/ata/pata_scc.c:908: warning: passing arg 5 of `ata_bmdma_drive_eh' from incompatible pointer type
make[2]: *** [drivers/ata/pata_scc.o] Error 1
make[1]: *** [drivers/ata] Error 2
make: *** [drivers] Error 2The problem seems to be in git-libata-all.patch but based on similar
changes in this patch, the following patch should be the fix.Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
drivers/ata/pata_scc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc6-mm1-025_fix_ppc64_kgdb/drivers/ata/pata_scc.c linux-2.6.23-rc6-mm1-030_fix_ppc64_sata/drivers/ata/pata_scc.c
--- linux-2.6.23-rc6-mm1-025_fix_ppc64_kgdb/drivers/ata/pata_scc.c 2007-09-18 11:29:26.000000000 +0100
+++ linux-2.6.23-rc6-mm1-030_fix_ppc64_sata/drivers/ata/pata_scc.c 2007-09-20 11:51:01.000000000 +0100
@@ -731,7 +731,7 @@ static u8 scc_bmdma_status (struct ata_p
void __iomem *mmio = ap->ioaddr.bmdma_addr;
u8 host_stat = in_be32(mmio + SCC_DMA_STATUS);
u32 int_status = in_be32(mmio + SCC_DMA_INTST);
- struct ata_queued_cmd *qc = ata_qc_from_tag(ap, ap->active_tag);
+ struct ata_queued_cmd *qc = ata_qc_from_tag(ap, ap->link.active_tag);
static int retry = 0;/* return if IOS_SS is cleared */
--
Mel...
On Thu, 20 Sep 2007 14:13:15 +0100
Its not been updated to match the libata core changes. Try something like
this. Whoever is maintaining it should also remove the prereset cable handling
code and use the proper cable detect method.Signed-off-by: Alan Cox <alan@redhat.com>
diff -u --exclude-from /usr/src/exclude --new-file --recursive linux.vanilla-2.6.23rc6-mm1/drivers/ata/pata_scc.c linux-2.6.23rc6-mm1/drivers/ata/pata_scc.c
--- linux.vanilla-2.6.23rc6-mm1/drivers/ata/pata_scc.c 2007-09-18 15:32:51.000000000 +0100
+++ linux-2.6.23rc6-mm1/drivers/ata/pata_scc.c 2007-09-20 14:23:32.879807760 +0100
@@ -731,7 +731,7 @@
void __iomem *mmio = ap->ioaddr.bmdma_addr;
u8 host_stat = in_be32(mmio + SCC_DMA_STATUS);
u32 int_status = in_be32(mmio + SCC_DMA_INTST);
- struct ata_queued_cmd *qc = ata_qc_from_tag(ap, ap->active_tag);
+ struct ata_queued_cmd *qc = ata_qc_from_tag(ap, ap->link.active_tag);
static int retry = 0;/* return if IOS_SS is cleared */
@@ -860,10 +860,10 @@
* @deadline: deadline jiffies for the operation
*/-static int scc_pata_prereset(struct ata_port *ap, unsigned long deadline)
+static int scc_pata_prereset(struct ata_link *link, unsigned long deadline)
{
- ap->cbl = ATA_CBL_PATA80;
- return ata_std_prereset(ap, deadline);
+ link->ap->cbl = ATA_CBL_PATA80;
+ return ata_std_prereset(link, deadline);
}/**
@@ -874,8 +874,9 @@
* Note: Original code is ata_std_postreset().
*/-static void scc_std_postreset (struct ata_port *ap, unsigned int *classes)
+static void scc_std_postreset (struct ata_link *link, unsigned int *classes)
{
+ struct ata_port *ap = link->ap;
DPRINTK("ENTER\n");/* is double-select really necessary? */
-
Hi,
It appears you forgot to fix scc_std_softreset() and one of its callsites
in scc_bdma_stop(). A complete patch is attempted below -- please review.[PATCH -mm] pata_scc: Keep up with libata core API changes
Little fixlets, that the build started erroring / warning about:
drivers/ata/pata_scc.c: In function 'scc_bmdma_status':
drivers/ata/pata_scc.c:734: error: structure has no member named 'active_tag'
drivers/ata/pata_scc.c: In function 'scc_pata_prereset':
drivers/ata/pata_scc.c:866: warning: passing arg 1 of 'ata_std_prereset' from incompatible pointer type
drivers/ata/pata_scc.c: In function 'scc_error_handler':
drivers/ata/pata_scc.c:908: warning: passing arg 2 of 'ata_bmdma_drive_eh' from incompatible pointer type
drivers/ata/pata_scc.c:908: warning: passing arg 3 of 'ata_bmdma_drive_eh' from incompatible pointer type
drivers/ata/pata_scc.c:908: warning: passing arg 5 of 'ata_bmdma_drive_eh' from incompatible pointer type
make[2]: *** [drivers/ata/pata_scc.o] Error 1Signed-off-by: Satyam Sharma <satyam@infradead.org>
Cc: Alan Cox <alan@redhat.com>
Cc: Mel Gorman <mel@skynet.ie>---
Andrew, this includes (supercedes) the previous two ones from Mel / Alan.
drivers/ata/pata_scc.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)diff -ruNp a/drivers/ata/pata_scc.c b/drivers/ata/pata_scc.c
--- a/drivers/ata/pata_scc.c 2007-09-22 06:26:37.000000000 +0530
+++ b/drivers/ata/pata_scc.c 2007-09-22 08:04:42.000000000 +0530
@@ -594,16 +594,17 @@ static unsigned int scc_bus_softreset(st
* Note: Original code is ata_std_softreset().
*/-static int scc_std_softreset (struct ata_port *ap, unsigned int *classes,
- unsigned long deadline)
+static int scc_std_softreset(struct ata_link *link, unsigned int *classes,
+ unsigned long deadline)
{
+ struct ata_port *ap = link->ap;
unsigned int slave_possible = ap->flags & ATA_FLAG_SLAVE_POSS;
u...
applied
-
I can confirm it builds without warnings or errors, so thanks. I'm not in
--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
Hi,
This patch solves the build failure, but with following warnings
CC drivers/ata/pata_scc.o
drivers/ata/pata_scc.c: In function ‘scc_error_handler’:
drivers/ata/pata_scc.c:909: warning: passing argument 3 of
‘ata_bmdma_drive_eh’ from incompatible pointer typeand after that the build fails with
CC [M] drivers/net/spider_net.o
drivers/net/spider_net.c: In function ‘spider_net_release_tx_chain’:
drivers/net/spider_net.c:818: error: ‘dev’ undeclared (first use in this
function)
drivers/net/spider_net.c:818: error: (Each undeclared identifier is
reported only once
drivers/net/spider_net.c:818: error: for each function it appears in.)
drivers/net/spider_net.c: In function ‘spider_net_xmit’:
drivers/net/spider_net.c:922: error: ‘dev’ undeclared (first use in this
function)
drivers/net/spider_net.c: In function ‘spider_net_pass_skb_up’:
drivers/net/spider_net.c:1018: error: ‘dev’ undeclared (first use in
this function)
drivers/net/spider_net.c: In function ‘spider_net_decode_one_descr’:
drivers/net/spider_net.c:1215: error: ‘dev’ undeclared (first use in
this function)
make[2]: *** [drivers/net/spider_net.o] Error 1
make[1]: *** [drivers/net] Error 2
make: *** [drivers] Error 2--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.-
I can confirm it builds with the following messages
CC [M] drivers/ata/pata_scc.o
drivers/ata/pata_scc.c: In function `scc_error_handler':
drivers/ata/pata_scc.c:909: warning: passing arg 3 of `ata_bmdma_drive_eh' from incompatible pointer typeAs for the rest, I cannot comment.
--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
Hello !
make-access-to-tasks-nsproxy-lighter.patch breaks unshare()
when called from unshare(), switch_task_namespaces() takes an
extra refcount on the nsproxy, leading to a memory leak of
nsproxy objects.Now the problem is that we still need that extra ref when called
from daemonize(). Here's an ugly fix for it.Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
---
include/linux/nsproxy.h | 5 +++++
kernel/exit.c | 2 ++
kernel/nsproxy.c | 7 -------
3 files changed, 7 insertions(+), 7 deletions(-)Index: 2.6.23-rc6-mm1/kernel/nsproxy.c
===================================================================
--- 2.6.23-rc6-mm1.orig/kernel/nsproxy.c
+++ 2.6.23-rc6-mm1/kernel/nsproxy.c
@@ -25,11 +25,6 @@ static struct kmem_cache *nsproxy_cachepstruct nsproxy init_nsproxy = INIT_NSPROXY(init_nsproxy);
-static inline void get_nsproxy(struct nsproxy *ns)
-{
- atomic_inc(&ns->count);
-}
-
/*
* creates a copy of "orig" with refcount 1.
*/
@@ -208,8 +203,6 @@ void switch_task_namespaces(struct task_
if (ns == new)
return;- if (new)
- get_nsproxy(new);
rcu_assign_pointer(p->nsproxy, new);if (ns && atomic_dec_and_test(&ns->count)) {
Index: 2.6.23-rc6-mm1/kernel/exit.c
===================================================================
--- 2.6.23-rc6-mm1.orig/kernel/exit.c
+++ 2.6.23-rc6-mm1/kernel/exit.c
@@ -408,6 +408,8 @@ void daemonize(const char *name, ...)
current->fs = fs;
atomic_inc(&fs->count);+ if (current->nsproxy != init_task.nsproxy)
+ get_nsproxy(init_task.nsproxy);
switch_task_namespaces(current, init_task.nsproxy);exit_files(current);
Index: 2.6.23-rc6-mm1/include/linux/nsproxy.h
===================================================================
--- 2.6.23-rc6-mm1.orig/include/linux/nsproxy.h
+++ 2.6.23-rc6-mm1/include/linux/nsproxy.h
@@ -77,6 +77,11 @@ static inline void put_nsproxy(struct ns
}
}+static inline void get_...
Looks sane :)
shouldn't we make the switch under this if() as well?
-
right. we can probably simplify switch_task_namespaces() and remove :
if (ns == new)
return;I'll cook a better one today.
Thanks !
C.
-
So I removed this test in
* daemonize() bc it is already done
* sys_unshare() bc the nsproxy is always new one
* exit_task_namespaces() bc it is called with NULL and the
task will die right after that.C.
make-access-to-tasks-nsproxy-lighter.patch breaks unshare()
when called from unshare(), switch_task_namespaces() takes an
extra refcount on the nsproxy, leading to a memory leak of
nsproxy objects.Now the problem is that we still need that extra ref when called
from daemonize(). Here's an ugly fix for it.Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>---
include/linux/nsproxy.h | 5 +++++
kernel/exit.c | 5 ++++-
kernel/nsproxy.c | 9 ---------
3 files changed, 9 insertions(+), 10 deletions(-)Index: 2.6.23-rc6-mm1/kernel/nsproxy.c
===================================================================
--- 2.6.23-rc6-mm1.orig/kernel/nsproxy.c
+++ 2.6.23-rc6-mm1/kernel/nsproxy.c
@@ -25,11 +25,6 @@ static struct kmem_cache *nsproxy_cachepstruct nsproxy init_nsproxy = INIT_NSPROXY(init_nsproxy);
-static inline void get_nsproxy(struct nsproxy *ns)
-{
- atomic_inc(&ns->count);
-}
-
/*
* creates a copy of "orig" with refcount 1.
*/
@@ -205,11 +200,7 @@ void switch_task_namespaces(struct task_
might_sleep();ns = p->nsproxy;
- if (ns == new)
- return;- if (new)
- get_nsproxy(new);
rcu_assign_pointer(p->nsproxy, new);if (ns && atomic_dec_and_test(&ns->count)) {
Index: 2.6.23-rc6-mm1/kernel/exit.c
===================================================================
--- 2.6.23-rc6-mm1.orig/kernel/exit.c
+++ 2.6.23-rc6-mm1/kernel/exit.c
@@ -408,7 +408,10 @@ void daemonize(const char *name, ...)
current->fs = fs;
...
Looks good. Thanks for catching the leak.
-
[patch submitter cc'd]
I had to reverse
convert-cpu_sibling_map-to-a-per_cpu-data-array-ppc64.patch (and the
fix on top of it) to get past this error on 32bit powerpc:make[1]: Entering directory `/usr/src/ctesiphon/linux-2.6.23-rc6-mm1'
CHK include/linux/version.h
CHK include/linux/utsrelease.h
CC arch/powerpc/kernel/asm-offsets.s
In file included from include/linux/smp.h:19,
from include/linux/sched.h:67,
from arch/powerpc/kernel/asm-offsets.c:17:
include/asm/smp.h:63: error: expected declaration specifiers or
On Wed, 19 Sep 2007 19:58:28 -0400
This, methinks.
--- a/include/asm-powerpc/smp.h~convert-cpu_sibling_map-to-a-per_cpu-data-array-ppc64-fix-2
+++ a/include/asm-powerpc/smp.h
@@ -25,8 +25,8 @@#ifdef CONFIG_PPC64
#include <asm/paca.h>
-#include <asm/percpu.h>
#endif
+#include <asm/percpu.h>extern int boot_cpuid;
(conditional includes are evil)
-
This is a multi-part message in MIME format.
--------------090208080409080002090402
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printableI get several "duplicate file name" messages.
Hope Greg's the right one to cc on these.<4>[ 21.211942] Duplicate file names "rtc" detected. [dump_trace+100/49=
8] dump_trace+0x64/0x1f2
<4>[ 21.216801] [show_trace_log_lvl+26/47] show_trace_log_lvl+0x1a/0x2=
f
<4>[ 21.221639] [show_trace+18/20] show_trace+0x12/0x14
<4>[ 21.226307] [dump_stack+22/24] dump_stack+0x16/0x18
<4>[ 21.231063] [proc_register+315/332] proc_register+0x13b/0x14c
<4>[ 21.235814] [create_proc_entry+114/135] create_proc_entry+0x72/0x8=
7
<4>[ 21.240562] [<faa1d123>] rtc_proc_add_device+0x1d/0x39 [rtc_core]
<4>[ 21.245304] [<faa1c1f9>] rtc_device_register+0x170/0x207 [rtc_core=
]
<4>[ 21.250077] [<faa3c5db>] cmos_do_probe+0x8d/0x1ee [rtc_cmos]
<4>[ 21.254715] [<faa3c77d>] cmos_pnp_probe+0x41/0x44 [rtc_cmos]
<4>[ 21.259477] [pnp_device_probe+102/135] pnp_device_probe+0x66/0x87
<4>[ 21.264206] [driver_probe_device+233/362] driver_probe_device+0xe9=
/0x16a
<4>[ 21.268738] [__driver_attach+108/165] __driver_attach+0x6c/0xa5
<4>[ 21.273387] [bus_for_each_dev+54/91] bus_for_each_dev+0x36/0x5b
<4>[ 21.277975] [driver_attach+25/27] driver_attach+0x19/0x1b
<4>[ 21.282501] [bus_add_driver+115/429] bus_add_driver+0x73/0x1ad
<4>[ 21.286984] [driver_register+103/108] driver_register+0x67/0x6c
<4>[ 21.291269] [pnp_register_driver+23/25] pnp_register_driver+0x17/0=
x19
<4>[ 21.295661] [<faa4000d>] cmos_init+0xd/0xf [rtc_cmos]
<4>[ 21.300044] [sys_init_module+5544/5953] sys_init_module+0x15a8/0x1=
741
<4>[ 21.304406] [sysenter_past_esp+107/181] sysenter_past_esp+0x6b/0xb=
5
<4>[ 21.308749] [phys_startup_32+3085673488/3221225472] 0...
Another data point. When booting into runlevel 3, the system is usable,
although these messages still appear after boot:<4>[ 22.731025] sysfs: duplicate filename 'bInterfaceNumber' can not be=
created
<4>[ 22.735872] WARNING: at fs/sysfs/dir.c:433 sysfs_add_one()
<7>[ 22.740737] bas_gigaset: <-------3: 0x11 (0 [0x00 0x00])
<4>[ 22.740881] [dump_trace+100/498] dump_trace+0x64/0x1f2
<7>[ 22.745708] bas_gigaset: <-------3: 0x11 (0 [0x00 0x00])
<4>[ 22.745721] [show_trace_log_lvl+26/47] show_trace_log_lvl+0x1a/0x2=
f
<4>[ 22.750533] [show_trace+18/20] show_trace+0x12/0x14
<4>[ 22.755285] [dump_stack+22/24] dump_stack+0x16/0x18
<4>[ 22.759997] [sysfs_add_one+87/188] sysfs_add_one+0x57/0xbc
<4>[ 22.764633] [sysfs_add_file+73/135] sysfs_add_file+0x49/0x87
<4>[ 22.769418] [sysfs_create_group+138/240] sysfs_create_group+0x8a/0=
xf0
<4>[ 22.774014] [<fa99c20a>] usb_create_sysfs_intf_files+0x2d/0xa2 [us=
bcore]
<4>[ 22.778659] [<fa9989ed>] usb_set_configuration+0x450/0x48d [usbcor=
e]
<4>[ 22.783246] [<fa99fa9b>] generic_probe+0x53/0x94 [usbcore]
<4>[ 22.787895] [<fa99a00b>] usb_probe_device+0x35/0x3b [usbcore]
<4>[ 22.792471] [driver_probe_device+233/362] driver_probe_device+0xe9=
/0x16a
<4>[ 22.797214] [__device_attach+8/10] __device_attach+0x8/0xa
<4>[ 22.801779] [bus_for_each_drv+56/97] bus_for_each_drv+0x38/0x61
<4>[ 22.806344] [device_attach+112/133] device_attach+0x70/0x85
<4>[ 22.810905] [bus_attach_device+41/119] bus_attach_device+0x29/0x77=<4>[ 22.815468] [device_add+795/1342] device_add+0x31b/0x53e
<4>[ 22.820052] [<fa993de7>] usb_new_device+0x5c/0x168 [usbcore]
<4>[ 22.824653] [<fa9954f9>] hub_thread+0x6b9/0xaa5 [usbcore]
<4>[ 22.829276] [kthread+59/100] kthread+0x3b/0x64
<4>[ 22.833823] [kernel_thread_helper+7/16] ...
On Thu, 20 Sep 2007 21:20:24 +0200
There was a locking imbalance in the IPC code. Do you have the fixes in
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc6...Greg isn't the only USB developer ;) linux-usb-devel cc added..
-
I hadn't. Now that I have, all the troubles are gone. X comes up
fine, and all of the segfault/invalid context/scheduling while atomic
messages have disappeared.Thanks,
Tilman--=20
Tilman Schmidt E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
On Thu, 20 Sep 2007 01:02:06 +0200
Nah, that's an rtc-specific problem.
I think David says that it's actually not a problem, but I didn't
really understand how this can be?Perhaps I'll need to drop that debugging patch. Which would be a shame,
because it can detect real bugs. Perhaps it needs a strcmp("rtc") to
filter out the (surprising) false positive.
-
RTC-related ... but it's a procfs bug, since it's procfs which doesn't
even bother to check for duplicate names before it registers files.
And it's that duplication which is the problem. Try the patch in
this messageI said it's not an RTC problem ... it's a procfs problem.
That _shouldn't_ be a problem at all; only one of them should be
able to bind to that hardware.The only problem I see in these messages is that procfs bug.
- Dave
-
On Wed, 19 Sep 2007 16:44:48 -0700
So you keep on claiming, but I don't think I've yet seen a description of
the *reason* why two copies of this file are being created, and aIt's not obvious that this is only a procfs bug. If some part of the
kernel tries to add a procfs file which is already there, that's often a
bug in the caller.Yes, procfs should have been checking for this. But it is too late now for
us to just fail out of the procfs registration code. Because this can
cause previously buggy-but-works-ok code to now fail completely.So I think the best we can do now is to retain the runtime warning and to
continue to "succeed" and to identify all the problematic codesites and to
either fix them or to convince ourselves that they really are working as
intended.
-
Although maybe you meant me to parse that differently ... as in,
not "why is procfs doing this broken thing?" but rather "how is it
that procfs fault (non)handling code ended up getting used?".That's always seemed self-evident: the RTC framework was creating
it for /dev/rtc0 (presumably, rtc-cmos), while at the same time the
legacy drivers/char/rtc.c was creating it for /dev/rtc.Workaround by configuring just one, and the problem goes away.
(Although it *ought* to be OK to configure both, with all the normalIt's not a wrong thing, at any rate. This is a fairly basic task
in any filesystem: mutual exclusion. Code is allowed to rely onNot really; procfs is supposed to not create it if it's already there!!
Reasonable callers will cope with "it didn't get created", and that'sWhat do you mean by too late "now" ... just-before-2.6.23?
I'm a bit puzzled why this issue cropped up suddenly, when the
"two RTC drivers" configs have been behaving fine (presumably
failing properly, but at least not generating problem reports)
for some time. One of the procfs changes must have caused this
trouble.And what would an example be of buggy-but-works code, which would
At a micro level, both the relevant call sites already have code
to tolerate the "couldn't create that file" error, which looked
correct at a quick reading.- Dave
-
Head still spinning.
a) "rtc0" != "rtc" ??
b) The kernel is trying to register two procfs files of the same name!
We're relying on procfs behaviour to prevent registration of two rtc
devices? That's a strange means of arbitration. And what happensNo.
What I mean is that we cannot change procfs as you propose until we have
located and fixed all places in the kernel which can cause duplicated
procfs names. (Good luck with that).Because we've had this happening before, and *the driver kept working*
(apart from, presumably, some of their procfs features which, presumably,
nobody used much).With the change which you propose, these drivers will probably stop
I'm not sure that anything _has_ changed. It's just that a few people have
suddenly noticed duplicated rtc entris in /proc. Perhaps theye were thereA driver which checks the return value from procfs registration and which
-
One key difference between any of the numerous legacy RTC drivers
and the new framework is that the legacy drivers tended to think
(wrongly!) they would be the only RTC in the system, while the
framework recognizes that other configurations are important.Hence names "rtc0", "rtc1", "rtc2" ... etc in framework drivers,
versus just "rtc" in legacy code. Easily tweaked with "udev" if it
matters, but current versions of "hwclock" (in util-linux and also
in busybox) don't require a /dev/rtc file any more ... they checkNot so much an "error" as minor awkwardness. The end result is
Not at all. But if two chunks of code try to create /proc/x/y/z,
only one of them should succeed. Today, there's a clear bug inThen nothing gets stored there; procfs is nonessential. The same
But causing duplicated names doesn't need any "fix" when that code
handles the error sanely -- by using its "PROCFS=n" code paths.If you're arguing that you think such fault handling code may have
a lot of bugs, I might agree: not all fault handling code gets even
basic testing. But that would seem to be nothing more than a reason
to want some code auditing (to see how common such bugs really are,
and fix at least a few of them), and perhaps to let such bugfixes sitEvidently there's no procfs maintainer ... you've said there are
patches in MM to detect and report such duplication. What is itThe last time I had occasion to look at such stuff I observed that procfs
users were generally sane. If they couldn't create a file, they justErroring out is appropriate for fatal errors ... but procfs errors
generally can't be fatal. (Code which uses procfs for its primary
userspace interface would be the exception.) So most code should
never error out on such faults.As I commented above, most code I've happend across will proceed
just fine if the extra procfs support is unavailable. After all,
it needs to work with CONFIG_PROCFS=n so those code paths aren't
going to suddenly stop working!- D...
On Wed, 19 Sep 2007 23:11:21 -0700
I guess the best solution would be to have Kconfig avoid that. I'll try
to code something as soon as I get back home.--
Best regards,
Alessandro Zummo,
Tower Technologies - Torino, Italy-
Isn't that all caused by the rtc driver registering itself without
checking if it is able to grab the device? Wouldn't moving the
request_resource() before doing any device registration do the magic?Kay
-
I suspect it would. Got patch?
I'm trying to remember why it registers before grabbing resources
(I/O region, irq) ... which is not the usual pattern, and is somewhat
of a surprise every time I notice it. Though everything's undone on
error, so it "should" work just fine.If that's not just some debugging leftover, the only thing that comes
to mind is that maybe it was to ensure that the resource labels in
/proc/{interrupts,ioports} clearly distinguished this from the legacy
driver. The now-conventional way to label things there is by using
a logical name like "rtc0", which is the output of registration.- Dave
-
AFAICT the rtc problem is caused by misconfiguration: both the new
and old rtc driver have been built and they are both trying to load.
-
Rats. Sorry. I remember now. That's not the first time I am hit by
that one. I had even made a resolution to try and find out the correct
options to set. So what are they? CONFIG_RTC=3Dy and CONFIG_RTC_DRV_CMOS=3D=
n?
Guess I should try that combination in my next round of tests.Anyway, shouldn't there be a warning or at least some clear instructions?=
As it is, both the old and the new driver claim in Kconfig that you need
them in order to access your RTC, and none of them even mentions the
other, so it's way too easy to end up enabling both of them.So are all my other woes a consequence of that one?
Thanks,
T.--=20
Tilman Schmidt E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
Did that now. It does get rid of the
"Duplicate file names "rtc" detected."--=20
Tilman Schmidt E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
Seems I have a case of a largish i386 NUMA (NUMA-Q) which has a mkfs
stuck in a 'D' wait:=======================
mkfs.ext2 D c10220f4 0 6233 6222
c344fc80 00000082 00000286 c10220f4 c344fc90 002ed099 c2963340 c2b9f640
c142bce0 c2b9f640 c344fc90 002ed099 c344fcfc c344fcc0 c1219563 c1109bf2
c344fcc4 c186e4d4 c186e4d4 002ed099 c1022612 c2b9f640 c186e000 c104000c
Call Trace:
[<c10220f4>] lock_timer_base+0x19/0x35
[<c1219563>] schedule_timeout+0x70/0x8d
[<c1109bf2>] prop_fraction_single+0x37/0x5d
[<c1022612>] process_timeout+0x0/0x5
[<c104000c>] task_dirty_limit+0x3a/0xb5
[<c12194da>] io_schedule_timeout+0x1e/0x28
[<c10454b4>] congestion_wait+0x62/0x7a
[<c102b021>] autoremove_wake_function+0x0/0x33
[<c10402af>] get_dirty_limits+0x16a/0x172
[<c102b021>] autoremove_wake_function+0x0/0x33
[<c104040b>] balance_dirty_pages+0x154/0x1be
[<c103bda3>] generic_perform_write+0x168/0x18a
[<c103be38>] generic_file_buffered_write+0x73/0x107
[<c103c346>] __generic_file_aio_write_nolock+0x47a/0x4a5
[<c11b0fef>] do_sock_write+0x92/0x99
[<c11b1048>] sock_aio_write+0x52/0x5e
[<c103c3b9>] generic_file_aio_write_nolock+0x48/0x9b
[<c105d2d6>] do_sync_write+0xbf/0xfc
[<c102b021>] autoremove_wake_function+0x0/0x33
[<c1010311>] do_page_fault+0x2cc/0x739
[<c105d3a0>] vfs_write+0x8d/0x108
[<c105d4c3>] sys_write+0x41/0x67
[<c100260a>] syscall_call+0x7/0xb
=======================This machine and others have run numerous test runs on this kernel and
this is the first time I've see a hang like this.I wonder if this is the ultimate cause of the couple of mainline hangs
which were seen, but not diagnosed.-apw
-
I've been seeing something like that on 4-way PPC64: in my case I've
shells hanging in D state trying to append to kernel build log on ext3
(the builds themselves going on elsewhere, in tmpfs): one of the shellsMy *guess* is that this is peculiar to 2.6.23-rc6-mm1, and from Peter's
mm-per-device-dirty-threshold.patch. printks showed bdi_nr_reclaimable
0, bdi_nr_writeback 24, bdi_thresh 1 in balance_dirty_pages (though I've
not done enough to check if those really correlate with the hangs),
and I'm wondering if the bdi_stat_sum business is needed on the
!nr_reclaimable path.So I'm running now with the patch below, good so far, but can't judge
until tomorrow whether it has actually addressed the problem seen.Not-yet-Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
mm/page-writeback.c | 53 +++++++++++++++++++-----------------------
1 file changed, 24 insertions(+), 29 deletions(-)--- 2.6.23-rc6-mm1/mm/page-writeback.c 2007-09-18 12:28:25.000000000 +0100
+++ linux/mm/page-writeback.c 2007-09-19 20:00:46.000000000 +0100
@@ -379,7 +379,7 @@ static void balance_dirty_pages(struct a
bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
- break;
+ break;if (!bdi->dirty_exceeded)
bdi->dirty_exceeded = 1;
@@ -392,39 +392,34 @@ static void balance_dirty_pages(struct a
*/
if (bdi_nr_reclaimable) {
writeback_inodes(&wbc);
-
+ pages_written += write_chunk - wbc.nr_to_write;
get_dirty_limits(&background_thresh, &dirty_thresh,
&bdi_thresh, bdi);
+ }- /*
- * In order to avoid the stacked BDI deadlock we need
- * to ensure we accurately count the 'dirty' pages when
- * the threshold is low.
- *
- * Otherwise it would be possible to get thresh+n pages
- * reported dirty, even though there are thresh-m pages
- * actually dirty; with m+n sitting in the percp...
On Wed, 19 Sep 2007 21:03:19 +0100 (BST) Hugh Dickins
FWIW my tired brain seems to think it the !nr_reclaimable path needs it
-
Last night's run went well: that patch does indeed seem to have fixed it.
Looking at the timings (some variance but _very_ much less than the night
before), there does appear to be some other occasional slight slowdown -
but I've no reason to suspect your patch for it, nor to suppose it's
something new: it may just be an artifact of my heavy swap thrashing.[PATCH mm] mm per-device dirty threshold fix
Fix occasional hang when a task couldn't get out of balance_dirty_pages:
mm-per-device-dirty-threshold.patch needs to reevaluate bdi_nr_writeback
across all cpus when bdi_thresh is low, even in the case when there was
no bdi_nr_reclaimable.Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
mm/page-writeback.c | 53 +++++++++++++++++++-----------------------
1 file changed, 24 insertions(+), 29 deletions(-)--- 2.6.23-rc6-mm1/mm/page-writeback.c 2007-09-18 12:28:25.000000000 +0100
+++ linux/mm/page-writeback.c 2007-09-19 20:00:46.000000000 +0100
@@ -379,7 +379,7 @@ static void balance_dirty_pages(struct a
bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
- break;
+ break;if (!bdi->dirty_exceeded)
bdi->dirty_exceeded = 1;
@@ -392,39 +392,34 @@ static void balance_dirty_pages(struct a
*/
if (bdi_nr_reclaimable) {
writeback_inodes(&wbc);
-
+ pages_written += write_chunk - wbc.nr_to_write;
get_dirty_limits(&background_thresh, &dirty_thresh,
&bdi_thresh, bdi);
+ }- /*
- * In order to avoid the stacked BDI deadlock we need
- * to ensure we accurately count the 'dirty' pages when
- * the threshold is low.
- *
- * Otherwise it would be possible to get thresh+n pages
- * reported dirty, even though there are thresh-m pages
- * actually dirty; with m+n sitting in the percpu
- * deltas.
- */
- if (bdi_thresh < 2*bdi_stat_error(bdi)) {
- ...
Thank you Hugh. I ran into similar problems with many dd(large file)
operations. This patch seems to fix it.But now my desktop was locked up again when writing a lot of small
files. The problem is repeatable with the command
$ ketchup 2.6.23-rc6-mm1I writeup two debug patches:
---
mm/page-writeback.c | 9 +++++++++
1 file changed, 9 insertions(+)--- linux-2.6.22.orig/mm/page-writeback.c
+++ linux-2.6.22/mm/page-writeback.c
@@ -426,6 +426,14 @@ static void balance_dirty_pages(struct a
bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
}+ printk(KERN_DEBUG "balance_dirty_pages written %lu %lu congested %d limits %lu %lu %lu %lu %lu %ld\n",
+ pages_written,
+ write_chunk - wbc.nr_to_write,
+ bdi_write_congested(bdi),
+ background_thresh, dirty_thresh,
+ bdi_thresh, bdi_nr_reclaimable, bdi_nr_writeback,
+ bdi_thresh - bdi_nr_reclaimable - bdi_nr_writeback);
+
if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
break;
if (pages_written >= write_chunk)---
mm/page-writeback.c | 5 +++++
1 file changed, 5 insertions(+)--- linux-2.6.22.orig/mm/page-writeback.c
+++ linux-2.6.22/mm/page-writeback.c
@@ -373,6 +373,7 @@ static void balance_dirty_pages(struct a
long bdi_thresh;
unsigned long pages_written = 0;
unsigned long write_chunk = sync_writeback_pages();
+ int i = 0;struct backing_dev_info *bdi = mapping->backing_dev_info;
@@ -434,6 +435,10 @@ static void balance_dirty_pages(struct a
bdi_thresh, bdi_nr_reclaimable, bdi_nr_writeback,
bdi_thresh - bdi_nr_reclaimable - bdi_nr_writeback);+ if (i++ > 200) {
+ printk(KERN_WARNING "balance_dirty_pages force break\n");
+ break;
+ }
if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
break;
if (pages_written >= write_chunk)Here are the output messages:
[ 1305.361511] balance_dirty_pages written 0 0 congested 0 limits 48869 195477 5801 5760 288 -247
[ 1305.361519] balance_dirty_page...
Hi
I've tried to compile 2.6.23-rc6-mm1, but it fails on ipg.c with the error :
Setup is 10964 bytes (padded to 11264 bytes).
System is 1640 kB
Kernel: arch/i386/boot/bzImage is ready (#1)
Building modules, stage 2.
MODPOST 2030 modules
WARNING: Can't handle masks in drivers/mtd/nand/cafe_nand:FFFF0
ERROR: "__udivdi3" [drivers/net/ipg.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2
error: Bad exit status from /home/trem/rpm/tmp/rpm-tmp.23895 (%build)I've instigated a bit, and I've found this code in ipg.c :
static void ipg_nic_txfree(struct net_device *dev)
{
struct ipg_nic_private *sp = netdev_priv(dev);
void __iomem *ioaddr = sp->ioaddr;
const unsigned int curr = ipg_r32(TFD_LIST_PTR_0) -
(sp->txd_map / sizeof(struct ipg_tx)) - 1;
unsigned int released, pending;sp->txd_map is an u64
because :
dma_addr_t txd_map;And in asm-i386/types.h, I see :
#ifdef CONFIG_HIGHMEM64G
typedef u64 dma_addr_t;
#else
typedef u32 dma_addr_t;
#endif
I my config, I use CONFIG_HIGHMEM64Gsizeof(struct ipg_tx) is an u32
So the div failed on i386 because of u64 / u32.I propose the following patch :
--- linux-2.6.22.old/drivers/net/ipg.c 2007-09-21 20:28:57.000000000 -0400
+++ linux-2.6.22.new/drivers/net/ipg.c 2007-09-21 20:22:15.000000000 -0400
@@ -837,7 +837,7 @@ static void ipg_nic_txfree(struct net_de
struct ipg_nic_private *sp = netdev_priv(dev);
void __iomem *ioaddr = sp->ioaddr;
const unsigned int curr = ipg_r32(TFD_LIST_PTR_0) -
- (sp->txd_map / sizeof(struct ipg_tx)) - 1;
+ do_div(sp->txd_map , sizeof(struct ipg_tx)) - 1;
unsigned int released, pending;IPG_DEBUG_MSG("_nic_txfree\n");
With this patch, it compiles on i386 and x86_64, but I haven't tested if
this network card works (I don't have it).regards,
trem-
On Sat, 22 Sep 2007 09:55:09 +0800 Fengguang Wu <wfg@mail.ustc.edu.cn>
<snip long series of mostly identical lines>
Most peculiar. It seems writeback_inodes() doesn't even attempt to
write out stuff. Nor are outstanding writeback pages completed.Could you perhaps instrument the writeback_inodes() path to see why
This looks like a sane series, we have a surplus of reclaimable pages,
start writeout, which increases writeback pages, and congests theMore stuff that doesn't look funny.
Curiously the lockup problem disappeared after upgrading to 2.6.23-rc6-mm1.
(need to watch it in a longer time window).Anyway here's the output of your patch:
sb_locked 0Still true. Another problem is that balance_dirty_pages() is being called even
when there are only 54 dirty pages. That could slow down writers unnecessarily.balance_dirty_pages() should not be entered at all with small nr_dirty.
Look at these lines:
[ 197.471619] balance_dirty_pages for tar written 405 405 congested 0 global 196554 54 403 196097 bdi 0 0 398 -398
[ 197.472196] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 0 0 380 -380
[ 197.472893] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 369 -346
[ 197.473158] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 366 -343
[ 197.473403] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 365 -342
[ 197.473674] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 364 -341
[ 197.474265] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 372 196128 bdi 23 0 362 -339
[ 197.475440] balance_dirty_pages for tar written 405 0 congested 0 global 196554 54 341 196159 bdi 47 0 327 -280
[ 197.476970] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 279 196213 bdi 95 0 279 -184
[ 197.477773] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 248 196244 bdi 95 0 255 -160
[ 197.479463] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 217 196275 bdi 143 0 210 -67
[ 197.479656] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 217 196275 bdi 143 0 209 -66
[ 197.481159] balance_dirty_pages for tar written 405 0 congested 0 global 196546 54 155 196337 bdi 167 0 163 4The trace messages are generated by the following code:
--- linux-2.6.23-rc6-mm1.orig/m...
On Sun, 23 Sep 2007 09:20:49 +0800 Fengguang Wu <wfg@mail.ustc.edu.cn>
It this the delta during one of these lockups? If so, it would seem
that although dirty pages are reported against the BDI, no actual dirty
inodes could be found.[ note to self: writeback_inodes() seems to write out to any superblock
in the system. Might want to limit that to superblocks on wbc->bdi ]You say that switching to .23-rc6-mm1 solved it in your case. You are
developing in the writeback_inodes() path, right? Could it be one ofThat is an interesting idea how about this:
---
Subject: mm: speed up writeback ramp-up on clean systemsWe allow violation of bdi limits if there is a lot of room on the
system. Once we hit half the total limit we start enforcing bdi limits
and bdi ramp-up should happen. Doing it this way avoids many small
writeouts on an otherwise idle system and should also speed up the
ramp-up.Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---Index: linux-2.6/mm/page-writeback.c
===================================================================
--- linux-2.6.orig/mm/page-writeback.c
+++ linux-2.6/mm/page-writeback.c
@@ -355,8 +355,8 @@ get_dirty_limits(long *pbackground, long
*/
static void balance_dirty_pages(struct address_space *mapping)
{
- long bdi_nr_reclaimable;
- long bdi_nr_writeback;
+ long nr_reclaimable, bdi_nr_reclaimable;
+ long nr_writeback, bdi_nr_writeback;
long background_thresh;
long dirty_thresh;
long bdi_thresh;
@@ -376,9 +376,24 @@ static void balance_dirty_pages(struct aget_dirty_limits(&background_thresh, &dirty_thresh,
&bdi_thresh, bdi);
+
+ nr_reclaimable = global_page_state(NR_FILE_DIRTY) +
+ global_page_state(NR_UNSTABLE_NFS);
+ nr_writeback = global_page_state(NR_WRITEBACK);
+
bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
- if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
+
+ /*
+ * break o...
no lockups, therefore not necessarily.
generic_sync_sb_inodes() does have something like:
if (wbc->bdi && bdi != wbc->bdi)
There are a lot of changes between them:
- bdi-v9 vs bdi-v10;
- a lot writeback patches in -mm
- some writeback patches maintained locallyIt looks like a workaround, but it does solve the most important problem.
And it is a good logic by itself. So I'd vote for it.The fundamental problem is that the per-bdi-writeback-completion based
estimation is not accurate under light loads. The problem remains for
a light-load sda when there is a heavy-load sdb. One more workaround
could be to grant bdi(s) a minimal bdi_thresh. Or better to adjust theThis may be slightly better:
if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
break;
/*
* Throttle it only when the background writeback cannot catchup.
*/
if (nr_reclaimable + nr_writeback <
(background_thresh + dirty_thresh) / 2)-
On Mon, 24 Sep 2007 11:01:10 +0800 Fengguang Wu <wfg@mail.ustc.edu.cn>
Well, sure, in that case sda would get to write out a lot of small
Ah, no, that is no good. For if there were a lot of BDIs this might
happen:Ah, indeed. Good idea.
-
Hmm, I cannot agree it to be fair - but pretty acceptable ;-)
Sure it is in the extreme case. However the limit could be ensured
if we really want(which I'm really not sure;-) it:if (nr_reclaimable + nr_writeback < dirty_thresh &&
bdi_nr_reclaimable + bdi_nr_writeback <= bdi_min_thresh)Thank you :-)
-
On Thu, 20 Sep 2007 12:31:39 +0100 (BST) Hugh Dickins
<hugh@veritas.com> wrote:-
No matter whether slub/slab is selected someone gets a page and moves/adds its
.lru head to some list after being in the deffered_pages list yet -- this breaks
the deffered_pages list and loops forever in list_for_each (unless LIST_DEBUG is
selected, which produces a BUG).Any ideas who could be the one pushing the page to some other list?
regards,
--
Jiri Slaby (jirislaby@gmail.com)
Faculty of Informatics, Masaryk University
-
Oh, only adds, if it moves, it won't break the list. Going to check for
POISON1/2 in __list_add, will get results (if any) in few moments...regards,
--
Jiri Slaby (jirislaby@gmail.com)
Faculty of Informatics, Masaryk University
-
Huh, it took longer than I expect :). Changed this:
diff --git a/arch/x86_64/mm/pageattr.c b/arch/x86_64/mm/pageattr.c
index 836c218..cd8499c 100644
--- a/arch/x86_64/mm/pageattr.c
+++ b/arch/x86_64/mm/pageattr.c
@@ -112,7 +112,14 @@ static inline void save_page(struct page *fpage, int data)
return;
SetPageFlush(fpage);
}
+ printk("ADD (%s): E=%p, H=%p, H->N=%p, N->P=%p, N->N=%p; PREV0=%p, "
+ "NEXT0=%p, ",
+ current->comm, &fpage->lru,
+ &deferred_pages, deferred_pages.next,
+ deferred_pages.next->prev, deferred_pages.next->next,
+ fpage->lru.prev, fpage->lru.next);
list_add(&fpage->lru, &deferred_pages);
+ printk("PREV1=%p, NEXT1=%p\n", fpage->lru.prev, fpage->lru.next);
}/*
@@ -274,6 +281,7 @@ void global_flush_tlb(void)
down_read(&init_mm.mmap_sem);
arg.full_flush = full_flush;
full_flush = 0;
+ printk("FLUSH\n");
list_replace_init(&deferred_pages, &arg.l);
up_read(&init_mm.mmap_sem);diff --git a/include/linux/list.h b/include/linux/list.h
index f29fc9c..1add963 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -265,6 +265,8 @@ static inline void list_del_init(struct list_head *entry)
static inline void list_move(struct list_head *list, struct list_head *head)
{
__list_del(list->prev, list->next);
+ list->next = LIST_POISON1;
+ list->prev = LIST_POISON2;
list_add(list, head);
}@@ -277,6 +279,8 @@ static inline void list_move_tail(struct list_head *list,
struct list_head *head)
{
__list_del(list->prev, list->next);
+ list->next = LIST_POISON1;
+ list->prev = LIST_POISON2;
list_add_tail(list, head);
}diff --git a/lib/list_debug.c b/l...
We should hold a single reference on the page for its membership in
deferred_pages.
-
The code is broken anyways. If you free pages without flushing
them first some other innocent user allocating them will end up
with possible uncached pages for some time.Does this simple patch help?
-Andi
Flush uncached AGP pages before freeing
Signed-off-by: Andi Kleen <ak@suse.de>
Index: linux/drivers/char/agp/generic.c
===================================================================
--- linux.orig/drivers/char/agp/generic.c
+++ linux/drivers/char/agp/generic.c
@@ -1185,6 +1185,7 @@ void agp_generic_destroy_page(void *addrpage = virt_to_page(addr);
unmap_page_from_agp(page);
+ flush_agp_mappings();
put_page(page);
free_page((unsigned long)addr);
atomic_dec(&agp_bridge->current_memory_agp);
-
I've attached a more complicated patch that does a 2 stage effort to
unmapping and freeing pages. My kernel no longer hangs with this
patch...Jiri can you confirm?
I'll look at the other issue separately..
Dave.
It's broken for me.
2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
-rc4-mm1: solid lock on X shutdown, random solid locks about
once every four hours
-rc6-mm1: solid lock on X startup
+your patch: screen goes black, turns off and on a few times during
startup, can reboot with sysrq-bVideo is:
01:00.0 VGA compatible controller: ATI Technologies Inc Radeon R250
[Mobility FireGL 9000] (rev 02)--
Mathematics is the supreme nostalgia of our time.
-
Hmm.. maybe I'm chasing a different bug manifested by the same patch. For me,
it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't
change matters.
This patch probably changes behaviour how the pages are queued on the list
somehow. Maybe it's insane to suggest everybody with similar problem to try
LIST_DEBUG, but just give it a try after having one of the patches applied ;).
(Or have you tried yet?)regards,
--
Jiri Slaby (jirislaby@gmail.com)
Faculty of Informatics, Masaryk University
-
OK, had a chance to test it, with Dave Airlie's AGP patch, and here's what it hit:
[ 198.925000] list_del corruption. next->prev should be ffff81000118f178, but was ffffffff8067e050
[ 198.925000] ------------[ cut here ]------------
[ 198.925000] kernel BUG at lib/list_debug.c:72!
[ 198.925000] invalid opcode: 0000 [1] PREEMPT SMP
[ 198.925000] last sysfs file: /devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-adapter/i2c-1/i2c-1/dev
[ 198.925000] CPU 1
[ 198.925000] Modules linked in:(Yes, I wish I got a backtrace, but that's as long as it lived. Apparently,
the netconsole stuff actually writing this stuff out was over on CPU0 which then
proceeded to croak).Some odd SMP-related race? (x86_64 kernel on a Core2 Duo T7200, if it matters)
It is rather the other user who adds the page to some other list while being at
deferred_pages list. Could you try my debug patch
(http://lkml.org/lkml/2007/9/19/141)?--
Jiri Slaby (jirislaby@gmail.com)
Faculty of Informatics, Masaryk University
-
or the whitespace non-damaged version:
http://www.fi.muni.cz/~xslaby/sklad/pageattr_debug
-
Gaak. Is that thing *supposed* to spew zillions of lines of output?
Some of the hits we get (I'm wondering if anything after the first makes
any sense, or if we're just slowly watching the corruption spread - the
thing ended up near 23K lines long before I gave up and hit the poweroff
button because there was no end in sight):(If there's something specific you want me to find in the output,
like "the first time we see XYZ", yell...)[ 103.701000] POISONS (ffff81000117dc88): ffff810006d14000, ffff8100034225c0
[ 103.701000]
[ 103.701000] Call Trace:
[ 103.701000] [<ffffffff803580aa>] __list_add+0xd7/0x138
[ 103.701000] [<ffffffff80358117>] list_add+0xc/0x11
[ 103.701000] [<ffffffff80270865>] free_hot_cold_page+0xe8/0x16d
[ 103.701000] [<ffffffff8027093e>] free_hot_page+0xb/0xd
[ 103.701000] [<ffffffff80270958>] __free_pages+0x18/0x21
[ 103.701000] [<ffffffff80270990>] free_pages+0x2f/0x34
[ 103.701000] [<ffffffff8028922d>] kmem_freepages+0xc5/0xce
[ 103.701000] [<ffffffff8028957f>] slab_destroy+0x3c/0x53
[ 103.701000] [<ffffffff80289663>] free_block+0xcd/0x110
[ 103.701000] [<ffffffff8028973a>] drain_array+0x94/0xc9
[ 103.701000] [<ffffffff8028a8c3>] cache_reap+0x0/0x105
[ 103.701000] [<ffffffff8028a948>] cache_reap+0x85/0x105
[ 103.701000] [<ffffffff80243d5d>] run_workqueue+0x8e/0x125
[ 103.701000] [<ffffffff8024478d>] worker_thread+0x0/0xe7
[ 103.701000] [<ffffffff80244869>] worker_thread+0xdc/0xe7
[ 103.701000] [<ffffffff80247f13>] autoremove_wake_function+0x0/0x38
[ 103.701000] [<ffffffff80247ddd>] kthread+0x49/0x78
[ 103.701000] [<ffffffff8020cfc8>] child_rip+0xa/0x12
[ 103.701000] [<ffffffff80247d94>] kthread+0x0/0x78
[ 103.701000] [<ffffffff8020cfbe>] child_rip+0x0/0x12
[ 103.701000][ 103.701000] POISONS (ffff81000117eac0): ffff810006d55000, ffff8100034225c0
[ 103.701000]
[ 103.701000] Call Trace:
[...
Heh :). The few last before the list corruption BUG (you have to have LIST_DEBUG
enabled) -- but it seems you never reached that phase?regards,
--
Jiri Slaby (jirislaby@gmail.com)
Faculty of Informatics, Masaryk University
-
Seems to be somewhat racy - had one attempt that obviously got into some grand
Mongolian flustercluck, because I had a 2M printk buffer defined, and more
than 2M worth of apparently looping output saying that the netconsole/printk
path was poisoning. I defined the printk buffer to 4M, added a initcall_debug,
and then it managed to die in a reasonable amount of time. Here's the whole
thing from when it starts blurbing out the POISONS messages until it
rolls over and dies (about 736 lines).(Interestingly, I can't find any of the 3 addresses listed in the 'list_add
corruption' message anywhere *else* in the netconsole output, and the last thing
we hear from before the kersplat is apparently an RCU callback in a softirq?)[ 47.997000] POISONS (ffff810003fb6ca8): ffff810003fb6ca8, ffff8100051600d8
[ 47.997000]
[ 47.997000] Call Trace:
[ 47.998000] [<ffffffff803580aa>] __list_add+0xd7/0x138
[ 47.998000] [<ffffffff802765fe>] vma_prio_tree_add+0xc9/0xe0
[ 47.998000] [<ffffffff80276649>] vma_prio_tree_insert+0x34/0x39
[ 47.998000] [<ffffffff8027da6c>] vma_adjust+0x310/0x452
[ 47.998000] [<ffffffff8027dc89>] split_vma+0xdb/0xed
[ 47.998000] [<ffffffff8027f30b>] mprotect_fixup+0x13b/0x481
[ 47.998000] [<ffffffff80323da3>] file_map_prot_check+0x7d/0x86
[ 47.998000] [<ffffffff80326d93>] selinux_file_mprotect+0xe0/0xe9
[ 47.998000] [<ffffffff8027f803>] sys_mprotect+0x1b2/0x22b
[ 47.998000] [<ffffffff8020c2fc>] tracesys+0xdc/0xe1
[ 47.998000]
[ 48.078000] POISONS (ffff81000402d768): ffff810004727810, ffff810006221810
[ 48.078000]
[ 48.078000] Call Trace:
[ 48.078000] [<ffffffff803580aa>] __list_add+0xd7/0x138
[ 48.078000] [<ffffffff80358117>] list_add+0xc/0x11
[ 48.078000] [<ffffffff802765e2>] vma_prio_tree_add+0xad/0xe0
[ 48.078000] [<ffffffff802324f0>] copy_process+0xc63/0x1515
[ 48.078000] [<ffffffff80232eff>] do_fork+0x75/0x20b
[ 48...
Hmm, there must be somebody else who changes it under hands without list_add.
Any lru skin game in the mid-layer nvidia sources? Do slab and slub behave the same?--
Jiri Slaby (jirislaby@gmail.com)
Faculty of Informatics, Masaryk University
-
Haven't tried LIST_DEBUG yet. I'm spending most of the weekend camping, so
will likely be unable to test that until Monday-ish (unless I get lucky and can
get a test in during the next 2 hours)...
Does it work with my simple dumb patch instead of Dave's ?
-Andi
-
Sorry, forgot to mention: your one-liner flush also doesn't work (same
behavior).I suspect I'm tripping two things and the flushing thing fixes one but
not the other.--
Mathematics is the supreme nostalgia of our time.
-
Just an idea, if you enable LIST_DEBUG, it won't spit anything out after the one
of patches applied, right?regards,
--
Jiri Slaby (jirislaby@gmail.com)
Faculty of Informatics, Masaryk University
-
Full bisect needed then I guess. Ok as a short cut you could perhaps
the cpa-* patches first (might need to drop some later depending
patches), then the drm and agp trees.-Andi
-
The later depending patches:
x86_64-mm-cpa-clflush.patch
x86_64-mm-cpa-cleanup.patch
x86_64-mm-cpa-einval.patch
x86_64-mm-cpa-arch-macro.patch
intel-iommu-clflush_cache_range-now-takes-size-param.patch
This fixes the hang-when-quitting-X on the Vaio.
-
Checked.
-
In theory this should be handled by the caller, so as to avoid the
overhead of continuous flushing however I can see a potential race
condition here if the pages are put back into the kernel before the
caller flushes the mappings..Do we need some sort of two step approach here? as flushing after each
page would be a major overhead for dynamic agp stuff in the new memory
manager..-
thanks,
--
Jiri Slaby (jirislaby@gmail.com)
Faculty of Informatics, Masaryk University
-
> Yeah. (But X doesn't run -- this is maybe the known issue in this release).
What do you mean with not run?
-Andi
-
(II) intel(0): Initializing HW Cursor
(II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535)
(WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0
at offset 0x5ff000 failed (Invalid argument)Fatal server error:
Couldn't bind memory for front bufferI thought I'd seen a thread about this issue, but I can't find it now. Is it
known or am I seeing ghosts yet, Andrew?regards,
--
Jiri Slaby (jirislaby@gmail.com)
Faculty of Informatics, Masaryk University
-
Can you send me a complete Xorg log file?
and lspci -vv?
my 945 works fine with my drm tree on top of Linus with clflush + my
agp fix I just sent out ..Dave.
-
Maybe you are rather interested in these dmesg lines:
Linux agpgart interface v0.102
agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup X.Org
on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c)
agpgart: Detected an Intel G33 Chipset.
agpgart: Detected 8192K stolen memory.
agpgart: AGP aperture is 256M @ 0xd0000000
[drm] Initialized drm 1.1.0 20060810
ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 16
[drm] Initialized i915 1.6.0 20060119 on minor 0
...
set status page addr 0x00033000
agpgart: pg_start == 0x000005ff,intel_private.gtt_entries == 0x00000800
agpgart: Trying to insert into local/stolen memorySo the problem is, that X passes too low start.
The X log:
# lspci -vvx
00:00.0 Host bridge: Intel Corporation DRAM Controller (rev 02)
Subsystem: Intel Corporation DRAM Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort+ >SERR- <PERR-
Latency: 0
Capabilities: [e0] Vendor Specific Information
00: 86 80 c0 29 06 00 90 20 02 00 00 06 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 c0 29
30: 00 00 00 00 e0 00 00 00 00 00 00 00 00 00 00 0000:02.0 VGA compatible controller: Intel Corporation Integrated Graphics
Controller (rev 02) (prog-if 00 [VGA])
Subsystem: Intel Corporation Integrated Graphics Controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin A routed to IRQ 16
Region 0: Memory at ffa80000 (32-bit, non-prefetchable) [size=512K]
Region 1: I/O ports at ec00 [size=8]
Region 2: Memory at d0000000 (32-bit, prefetchable) ...
I've cc'd Zhenyu who might be able to shed some light on this? can you
try 2.6.23-rc7 as maybe the G33 support still needs some work.. or
maybe I'm missing a patch in the drm..Dave.
-
Could you try current xf86-video-intel driver? just do
git clone git://anongit.freedesktop.org/git/xorg/driver/xf86-video-intelMy G33 was just verified broken today... so I'll try to reproduce it on
yep, should try 2.6.23-rc7 first, and it seems not drm relate.
-
It works! 3d problem, but it has maybe nothing to do with kernel:
$ glxinfo
name of display: :0.0
Unrecognized deviceID 29c2
X Error of failed request: GLXBadContext
...regards,
--
Jiri Slaby (jirislaby@gmail.com)
Faculty of Informatics, Masaryk University
-
yep, I also pushed a fix for G33 in xf86-video-intel when fixing the intel agp.
It looks you have an old version of mesa, that i915 dri driver doesn't know
your chipset. Try mesa-7.X.I have also seen X exit broken with 2.6.23-rc6-mm1, will follow this thread
and try Dave's patch.Thanks for testing!
-
That would probably have been me, saying that x86_64-mm-cpa-clflush.patch broke
the NVidia graphics driver in 23-rc3-mm1. Is it breaking *other* X drivers as
well?
Yes, the issue is there from rc3-mm1, see (intel and radeon cards are affected
in my case):
http://lkml.org/lkml/2007/9/9/51But now I'm talking about another issue -- a regression since rc4-mm1, where X
server is unable to bind agp memory (those x logs above). The clflush issue has
solved andi in
http://lkml.org/lkml/2007/9/19/334
recentlyregards,
--
Jiri Slaby (jirislaby@gmail.com)
Faculty of Informatics, Masaryk University
-
Tried that, my laptop still bricks the instant X starts up and the NVidia driver
tries to initialize. Not even sysrq-foo works. Time to power-cycle.
I'd expect the binary to be doing something stupid with it's flushing
and relying on the kernel to do something it no longer does.. so this
is most likely a case of not fixable..Dave.
-
Further info:
4690 write(0, "(II) intel(0): Initializing HW C"..., 38) = 38
4690 write(0, "(II) intel(0): xf86BindGARTMemor"..., 76) = 76
4690 ioctl(9, AGPIOC_BIND, 0x7fffbe1cb850) = -1 EINVAL (Invalid argument)
4690 write(0, "(WW) intel(0): xf86BindGARTMemor"..., 115) = 115--
Jiri Slaby (jirislaby@gmail.com)
Faculty of Informatics, Masaryk University
-
On Wed, 19 Sep 2007 22:01:59 +0200
This might be a Dave thing and not an Andi thing.
In my usual -mm-testing I only test X on the one machine (the Vaio,
natch). Check that it runs glxgears, check suspend/resume to mem and disk.
It has intel graphics and I'm not seeing any such problems.Have you time to bisect it?
http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt
describes how.As a quick test, perhaps build a tree with just
2.6.23-rc6+origin.patch+git-drm.patch? Fortunately git-agpgart.patch is
presently empty.-
Seeing the following panic booting an old powerpc LPAR:
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc000000000047b48
cpu 0x0: Vector: 300 (Data Access) at [c0000000006a3750]
pc: c000000000047b48: .pSeries_log_error+0x364/0x420
lr: c000000000047acc: .pSeries_log_error+0x2e8/0x420
sp: c0000000006a39d0
msr: 8000000000001032
dar: 0
dsisr: 42000000
current = 0xc0000000005acab0
paca = 0xc0000000005ad700
pid = 0, comm = swapper
enter ? for help
[c0000000006a3af0] c000000000021164 .rtas_call+0x200/0x250
[c0000000006a3ba0] c000000000049d50 .early_enable_eeh+0x168/0x360
[c0000000006a3c70] c00000000002f674 .traverse_pci_devices+0x8c/0x138
[c0000000006a3d10] c000000000560ce8 .eeh_init+0x1a8/0x200
[c0000000006a3db0] c00000000055fb70 .pSeries_setup_arch+0x128/0x234
[c0000000006a3e40] c00000000054f830 .setup_arch+0x214/0x24c
[c0000000006a3ee0] c000000000546a38 .start_kernel+0xd4/0x3e4
[c0000000006a3f90] c00000000045adc4 .start_here_common+0x54/0x58
0:mon>This machine is:
# cat /proc/cpuinfo
processor : 0
cpu : POWER4+ (gq)
clock : 1703.965296MHz
revision : 19.0[...]
machine : CHRP IBM,7040-681-apw
-
I am seeing this strange link error from a PowerMac G5 (powerpc):
[...]
KSYM .tmp_kallsyms2.S
AS .tmp_kallsyms2.o
LD vmlinux.o
ld: dynreloc miscount for fs/built-in.o, section .opd
ld: can not edit opd Bad value
make: *** [vmlinux.o] Error 1Compiler version below.
root@elm3b19:~# gcc -v
Using built-in specs.
Target: powerpc-linux-gnu
Configured with: ../src/configure -v
--enable-languages=c,c++,java,f95,objc,ada,treelang --prefix=/usr
--enable-shared --with-system-zlib --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --enable-nls
--program-suffix=-4.0 --enable-__cxa_atexit --enable-clocale=gnu
--enable-libstdcxx-debug --enable-java-awt=gtk-default
--enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.4.2-gcj-4.0-1.4.2.0/jre
--enable-mpfr --disable-softfloat
--enable-targets=powerpc-linux,powerpc64-linux --with-cpu=default32
--disable-werror --enable-checking=release powerpc-linux-gnu
Thread model: posix
gcc version 4.0.3 (Ubuntu 4.0.3-1ubuntu5)-apw
-
We have had this issue before.
Try to see:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commi...Here it was caused by a weak declaration that was superflous.
I expect the workaround to be equally simple this time - or I hope so.
Sam
-
Yep, will investigate this as suggested. As this problem seems to
persist thru 2.6.23-rc8-mm1 I will report it again there and my
findings.-apw
-
It's an ld error, could you show us your ld version instead? And
please try with current mainline ld, too?Segher
-
root@elm3b19:~# ld -v
GNU ld version 2.16.91 20060118 Debian GNU/LinuxGetting the compiler suite changed on here is going to be a lot
tricker. One of the reasons we keep it back there is those versions
are supposed to be supported and we want to test those combinations.-apw
-
Hello,
I wrote a simple script that finds all modules in /lib/modules/`uname -r`
and performs sth like 'modprobe $x; rmmod $x;' for every of them. The result is the
output below. This actually happens when rmmod'ing a module.I narrowed it down to modules mmc_core and firewire_core. To reproduce that just
rmmod them.PS. X doesn't work but that's another story.
Regards,
Mariusz
WARNING: at lib/kref.c:33 kref_get()
[<c0104a1c>] dump_trace+0x202/0x22b
[<c0104a5f>] show_trace_log_lvl+0x1a/0x30
[<c0105608>] show_trace+0x12/0x14
[<c010572c>] dump_stack+0x15/0x17
[<c024c567>] kref_get+0x40/0x42
[<c024b81e>] kobject_get+0x12/0x17
[<c02d5800>] bus_get+0x11/0x22
[<c02d5a89>] bus_remove_file+0xe/0x27
[<c02d5ab2>] remove_probe_files+0x10/0x1f
[<c02d5afc>] bus_unregister+0x3b/0x66
[<dee3c2bd>] sdio_unregister_bus+0xd/0xf [mmc_core]
[<dee3d66d>] mmc_exit+0xd/0x23 [mmc_core]
[<c013c9d6>] sys_delete_module+0x154/0x202
[<c010406a>] sysenter_past_esp+0x5f/0x99
[<ffffe410>] 0xffffe410
=======================
WARNING: at lib/kref.c:33 kref_get()
[<c0104a1c>] dump_trace+0x202/0x22b
[<c0104a5f>] show_trace_log_lvl+0x1a/0x30
[<c0105608>] show_trace+0x12/0x14
[<c010572c>] dump_stack+0x15/0x17
[<c024c567>] kref_get+0x40/0x42
[<c024b81e>] kobject_get+0x12/0x17
[<c02d5800>] bus_get+0x11/0x22
[<c02d5a89>] bus_remove_file+0xe/0x27
[<c02d5abe>] remove_probe_files+0x1c/0x1f
[<c02d5afc>] bus_unregister+0x3b/0x66
[<dee3c2bd>] sdio_unregister_bus+0xd/0xf [mmc_core]
[<dee3d66d>] mmc_exit+0xd/0x23 [mmc_core]
[<c013c9d6>] sys_delete_module+0x154/0x202
[<c010406a>] sysenter_past_esp+0x5f/0x99
[<ffffe410>] 0xffffe410
=======================
BUG: atomic counter underflow at:
[<c0104a1c>] dump_trace+0x202/0x22b
[<c0104a5f>] show_trace_log_lvl+0x1a/0x30
[<c0105608>]...
On Tue, 18 Sep 2007 23:45:13 +0200,
Hm, does that mean you can reproduce this when you modprobe/rmmod
either of them on a fresh boot? (There could be follow-on errors...)Is this the first error message you see? This looks like the embedded
kset had its refcount decreased too much. (Or was the sdio bus never
registered? Don't see how this could happen if mmc_init was successful,
though.)Strange. Can't see how this can happen except as a follow-on error.
(Following errors don't look like they could have anything to do with
the driver core.)-
On fresh boot it happens with mmc_core. On the next fresh boot it didn't happen with
firewire_core. So you can be right that some of them are follow-on errors. Anyway to
trigger it just run these commands:# logger -t "foobar" "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# modprobe mmc_core
# rmmod mmc_corePlease find it attached below.
Regards,
Mariusz
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
kobject mmc_core: registering. parent: <NULL>, set: module
kobject holders: registering. parent: mmc_core, set: <NULL>
kobject_uevent_env
kobject filter function caused the event to drop!
kobject_uevent_env
fill_kobj_path: path = '/module/mmc_core'
---- begin silly warning ----
This is a janitorial warning, not a kernel bug.
The kobject at, or inside 'mmc_bus_type'@(0xded0c1b0) is not dynamically allocated.
kobjects must be dynamically allocated, not static
---- end silly warning ----
kobject mmc: registering. parent: <NULL>, set: bus
kobject_uevent_env
fill_kobj_path: path = '/bus/mmc'
---- begin silly warning ----
This is a janitorial warning, not a kernel bug.
The kobject at, or inside 'mmc_bus_type'@(0xded0c290) is not dynamically allocated.
kobjects must be dynamically allocated, not static
---- end silly warning ----
kobject devices: registering. parent: mmc, set: <NULL>
kobject_uevent_env
kobject filter function caused the event to drop!
---- begin silly warning ----
This is a janitorial warning, not a kernel bug.
The kobject at, or inside 'mmc_bus_type'@(0xded0c220) is not dynamically allocated.
kobjects must be dynamically allocated, not static
---- end silly warning ----
kobject drivers: registering. parent: mmc, set: <NULL>
kobject_uevent_env
kobject filter function caused the event to drop!
---- begin silly warning ----
This is a janitorial warning, not a kernel bug.
The kobject at, or inside 'mmc_host_class'@(0xded0c4f8) is not dynamically allocated.
kobjects must be dynamically allocated, not static
---- end silly warning -...
On Wed, 19 Sep 2007 18:43:54 +0200,
OK, this seems to point to mmc_core then. (I couldn't reproduce it with
other modules registering a bus type like iucv either, so it doesn'tUnfortunately this would involve setting up a non-s390 test box, which
<Unrelated side note: We should probably save the kobject's name for
And now for some reason the reference count for the sdio bus has
already dropped to 0. Strange, since the addition/removal of stuff
below it looks symmetric, and it doesn't seem to do anything differentThe bus_get()/bus_put() stuff will lead to repeated calls to the
release function in this error case, that explains the following
errors. (By the way, isn't the bus_get()/but_pus() pair in
bus_remove_file() superfluous? The caller should hold a reference
anyway.)What completely confuses me here is that the devices/drivers ksets have
already been unregistered (see above), when remove_probe_files() shouldAnd here sdio is unregistered *again*. (The <NULL> indicates that this
-ECONFUSED. Perhaps DEBUG_DRIVER may help some more. Or /me getting
some sleep.
-
On Wed, 19 Sep 2007 20:02:05 +0200,
Sleeping helped (improved reading abilities).
From: Cornelia Huck <cornelia.huck@de.ibm.com>
mmc: Avoid double sdio_unregister_bus() on module unload.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
drivers/mmc/core/core.c | 1 -
1 files changed, 1 deletion(-)--- linux-2.6.orig/drivers/mmc/core/core.c
+++ linux-2.6/drivers/mmc/core/core.c
@@ -820,7 +820,6 @@ static void __exit mmc_exit(void)
{
sdio_unregister_bus();
mmc_unregister_key_type();
- sdio_unregister_bus();
mmc_unregister_host_class();
mmc_unregister_bus();
destroy_workqueue(workqueue);
-
Ok - I can confirm that this patch fixes the problem.
Thanks,
-
On Thu, 20 Sep 2007 15:35:18 +0200
oooops. :)
This has been pushed to kernel.org. Andrew, update at will.
Rgds
--
-- Pierre OssmanLinux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
-
On Wed, 19 Sep 2007 20:02:05 +0200,
kobject: Temporarily save k_name on cleanup for debug message.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
lib/kobject.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)--- linux-2.6.orig/lib/kobject.c
+++ linux-2.6/lib/kobject.c
@@ -498,9 +498,9 @@ void kobject_cleanup(struct kobject * ko
struct kobj_type * t = get_ktype(kobj);
struct kset * s = kobj->kset;
struct kobject * parent = kobj->parent;
+ const char *k_name = kobj->k_name;pr_debug("kobject %s: cleaning up\n",kobject_name(kobj));
- kfree(kobj->k_name);
kobj->k_name = NULL;
if (t && t->release)
t->release(kobj);
@@ -508,8 +508,8 @@ void kobject_cleanup(struct kobject * ko
pr_debug("kobject '%s' does not have a release() function, "
"if this is not a directory kobject, it is broken "
"and must be fixed.\n",
- kobject_name(kobj));
-
+ k_name);
+ kfree(k_name);
if (s)
kset_put(s);
kobject_put(parent);
-
Just FYI this is what it looks like with your patch applied:
kobject mmc_core: registering. parent: <NULL>, set: module
kobject holders: registering. parent: mmc_core, set: <NULL>
kobject_uevent_env
kobject filter function caused the event to drop!
kobject_uevent_env
fill_kobj_path: path = '/module/mmc_core'
---- begin silly warning ----
This is a janitorial warning, not a kernel bug.
The kobject at, or inside 'mmc_bus_type'@(0xdee71c10) is not dynamically allocated.
kobjects must be dynamically allocated, not static
---- end silly warning ----
kobject mmc: registering. parent: <NULL>, set: bus
kobject_uevent_env
fill_kobj_path: path = '/bus/mmc'
---- begin silly warning ----
This is a janitorial warning, not a kernel bug.
The kobject at, or inside 'mmc_bus_type'@(0xdee71cf0) is not dynamically allocated.
kobjects must be dynamically allocated, not static
---- end silly warning ----
kobject devices: registering. parent: mmc, set: <NULL>
kobject_uevent_env
kobject filter function caused the event to drop!
---- begin silly warning ----
This is a janitorial warning, not a kernel bug.
The kobject at, or inside 'mmc_bus_type'@(0xdee71c80) is not dynamically allocated.
kobjects must be dynamically allocated, not static
---- end silly warning ----
kobject drivers: registering. parent: mmc, set: <NULL>
kobject_uevent_env
kobject filter function caused the event to drop!
---- begin silly warning ----
This is a janitorial warning, not a kernel bug.
The kobject at, or inside 'mmc_host_class'@(0xdee71f58) is not dynamically allocated.
kobjects must be dynamically allocated, not static
---- end silly warning ----
---- begin silly warning ----
This is a janitorial warning, not a kernel bug.
The kobject at, or inside 'mmc_host_class'@(0xdee71ed0) is not dynamically allocated.
kobjects must be dynamically allocated, not static
---- end silly warning ----
kobject mmc_host: registering. parent: <NULL>, set: class
kobject_uevent_env
fill_kobj_path: p...
On Wed, 19 Sep 2007 20:02:05 +0200,
kobject: Temporarily save k_name on cleanup for debug message.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
lib/kobject.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)--- linux-2.6.orig/lib/kobject.c
+++ linux-2.6/lib/kobject.c
@@ -498,9 +498,9 @@ void kobject_cleanup(struct kobject * ko
struct kobj_type * t = get_ktype(kobj);
struct kset * s = kobj->kset;
struct kobject * parent = kobj->parent;
+ const char *k_name = kobj->k_name;pr_debug("kobject %s: cleaning up\n",kobject_name(kobj));
- kfree(kobj->k_name);
kobj->k_name = NULL;
if (t && t->release)
t->release(kobj);
@@ -508,8 +508,8 @@ void kobject_cleanup(struct kobject * ko
pr_debug("kobject '%s' does not have a release() function, "
"if this is not a directory kobject, it is broken "
"and must be fixed.\n",
- kobject_name(kobj));
-
+ k_name);
+ kfree(k_name);
if (s)
kset_put(s);
kobject_put(parent);
-
Hello,
I got the following error when building 2.6.23-rc6-mm1 on sparc:
/opt/crosstool/gcc-4.1.1-glibc-2.3.6/sparc-unknown-linux-gnu/bin/sparc-unknown-linux-gnu-gcc -Wp,-MD,drivers/serial/.serial_core.o.d -nostdinc -isystem /opt/crosstool/gcc-4.1.1-glibc-2.3.6/sparc-unknown-linux-gnu/lib/gcc/sparc-unknown-linux-gnu/4.1.1/include -D__KERNEL__ -Iinclude -Iinclude2 -I/home/compudj/git/linux-2.6-lttng/include -include include/linux/autoconf.h -I/home/compudj/git/linux-2.6-lttng/drivers/serial -Idrivers/serial -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Os -m32 -pipe -mno-fpu -fcall-used-g5 -fcall-used-g7 -fomit-frame-pointer -fno-stack-protector -Wdeclaration-after-statement -Wno-pointer-sign -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(serial_core)" -D"KBUILD_MODNAME=KBUILD_STR(serial_core)" -c -o drivers/serial/.tmp_serial_core.o /home/compudj/git/linux-2.6-lttng/drivers/serial/serial_core.c
/home/compudj/git/linux-2.6-lttng/drivers/serial/serial_core.c: In function 'uart_suspend_port':
/home/compudj/git/linux-2.6-lttng/drivers/serial/serial_core.c:1980: error: implicit declaration of function 'enable_irq_wake'
/home/compudj/git/linux-2.6-lttng/drivers/serial/serial_core.c: In function 'uart_resume_port':
/home/compudj/git/linux-2.6-lttng/drivers/serial/serial_core.c:2035: error: implicit declaration of function 'disable_irq_wake'
make[3]: *** [drivers/serial/serial_core.o] Error 1
make[2]: *** [drivers/serial] Error 2
make[1]: *** [drivers] Error 2
make: *** [_all] Error 2config:
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-rc6-mm1
# Tue Sep 18 11:06:46 2007
#
CONFIG_MMU=y
CONFIG_HIGHMEM=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_ARCH_NO_VIRT_TO_BUS=y
CONFIG_OF=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
...
On Tue, 18 Sep 2007 16:54:03 -0400
hm, I wonder why I didn't hit that.
enable_irq_wake() was added by wake-up-from-a-serial-port.patch
I note that git-input adds a call too, and might have a problem
with !CONFIG_GENERIC_HARDIRQS.Not sure what the best fix is here. We could sprinkle ifdefs all
over the code, or just add the suitable empty stubs for enable_irq_wake(),
etc.
-
Provide {enable,disable}_irq_wakeup dummies for undefined
CONFIG_GENERIC_HARDIRQS case. Completely untested, as I don't even have
cross-compilers for platforms without CONFIG_GENERIC_HARDIRQS.Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
---
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 5323f62..ecade41 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -205,6 +205,9 @@ static inline int disable_irq_wake(unsigned int irq)
enable_irq(irq)
# endif+#define enable_irq_wake(irq) ({ (void)(irq); 0; })
+#define disable_irq_wake(irq) ({ (void)(irq); 0; })
+
#endif /* CONFIG_GENERIC_HARDIRQS */#ifndef __ARCH_SET_SOFTIRQ_PENDING
-
It builds fine now.
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
On my HP nx6325 it only boots with "noacpitimer nohpet" on the command line,
but then it works. Suspend-to-RAM and hibernation work too. :-)Since 2.6.23-rc4-mm1 only booted with nohpet because of
x86_64-convert-to-clockevents.patch
I guess that the boot problems with this one result from the same patch.
Greetings,
Rafael
-
It _sometimes_ boots with "noacpitimer nohpet" and that's if I press the power
button for a couple of times during boot (before any messages appear on theNot sure any more ...
I'll try to compile it with NO_HZ and HIGH_RES_TIMERS unset.
Greetings,
Rafael
-
OK, in that configuration it's much better.
It boots with nohpet alone and suspend/hibernation seem to work (still,
it didn't want to boot right after hibernation, but booted after I'd switched
it off/on manually).Unfortunately, I get this in dmesg:
ALSA /home/rafael/src/mm/linux-2.6.23-rc6-mm1/sound/pci/hda/hda_intel.c:1758: hda-intel: ioremap error
and (obviously) the sound card doesn't work.
Additionally, I've got a couple of these:
WARNING: at /home/rafael/src/mm/linux-2.6.23-rc6-mm1/drivers/usb/core/driver.c:1
217 usb_autopm_do_device()Call Trace:
[<ffffffff8813885e>] :usbcore:usb_autopm_do_device+0x60/0xe9
[<ffffffff88138910>] :usbcore:usb_autosuspend_device+0xc/0xe
[<ffffffff88131aa8>] :usbcore:usb_disconnect+0x15f/0x18c
[<ffffffff88133305>] :usbcore:hub_thread+0x691/0x10a1
[<ffffffff8024a077>] autoremove_wake_function+0x0/0x38
[<ffffffff88132c74>] :usbcore:hub_thread+0x0/0x10a1
[<ffffffff80249f50>] kthread+0x49/0x79
[<ffffffff8020ce98>] child_rip+0xa/0x12
[<ffffffff80249f07>] kthread+0x0/0x79
[<ffffffff8020ce8e>] child_rip+0x0/0x12Greetings,
Rafael
-
Can you please check, whether
http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch
works for you ?
tglx
-
Nope. It's a total disaster. :-(
Doesn't boot at all, even with "noacpitimer nohpet", and that's with
NO_HZ and HIGH_RES_TIMERS unset.If you have a bisectable patch series, I can try to identify the responsible
patch.Greetings,
Rafael
-
True. I have instrumented it to the point where the broadcast device is
http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patche...
The first patches in the queue are the mainline fixups.
tglx
-
It's x86_64-convert-to-clockevents.patch (ie. after applying it the box stops
to boot).I haven't had the time to check if any special command line arguments help.
Will check tomorrow.Greetings,
Rafael
-
Can you please disable the patches, which I sent Linus wards:
timekeeping-access-rtc-outside-xtime-lock.patch
xtime-supsend-resume-fixup.patch
acpi-reevaluate-c-p-t-states.patch
clockevents-enforce-broadcast-on-resume.patch
clockevents-do-not-shutdown-broadcast-device-in-oneshot-mode.patch
clockevents-prevent-stale-tick-update-on-offline-cpu.patchWithout those patches you get the state of rc4-mm1. It would be
interesting to know which one interferes with the acpi stuff.tglx
-
I have skipped all of them, but the resulting kernel behaves in the same
It looks like something else went in between -rc4 and -rc6 that broke your
patch. I wonder what it might be ...Greetings,
Rafael
-
Hmm. Can you please go back in the -hrt project history:
http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patche...
http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patche...Also, can you send me your .config file please ?
Vs. the suspend / resume wreckage of rc6-mm1 / rc6-hrt2:
I'm still fishing in rather dark water. Depending on the added
instrumentation points the problem mutates up to the point where it
vanishes completely. The hang, which requires key strokes again, happens
consistently at the same place:The notifier call in kernel/cpu.c::_cpu_up()
ret = __raw_notifier_call_chain(&cpu_chain, CPU_UP_PREPARE | mod, hcpu,
-1, &nr_calls);does not return, but _all_ registered notifiers are called and reach
their return statement. This reminds me on:http://lkml.org/lkml/2007/5/9/46
Sigh. I have no clue where to dig further.
tglx
-
Well, the above may affect SMP systems, but the Vaio is UP. Hmm?
Greetings,
Rafael
Each of them on top of 2.6.23-rc6 gives the same symptoms as rc6-hrt2 (ie. the
box doesn't boot).I'm going to check if -rc5 with patch-2.6.23-rc4-hrt1 on top of it works and
if not (I suspect so), I'll bisect the Linus' tree between -rc4 and -rc5 in
order to identify the responsible patch.Greetings,
Rafael
-
My jinxed VAIO variant is SMP, but it looks like the same mysterious
error.tglx
-
Hm. Have you tried
# echo test > /sys/power/disk
# echo disk > /sys/power/state(should suspend devices and disable the nonboot CPUs, wait for 5 sec. and
restore everything)?Greetings,
Rafael
-
Works fine, but I need to reboot into a non debug kernel to verify.
tglx
-
Works as well. What's the difference between this and the real thing ?
tglx
-
The real thing also calls device_power_down(PMSG_FREEZE), which is a
counterpart of sysdev_shutdown(), more or less, and I think that's what goes
belly up.You can use the patch below (on top of -rc6-mm1), which just disables the image
creation (that should be irrelevant anyway) and see what happens.Greetings,
Rafael---
kernel/power/disk.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)Index: linux-2.6.23-rc6-mm1/kernel/power/disk.c
===================================================================
--- linux-2.6.23-rc6-mm1.orig/kernel/power/disk.c
+++ linux-2.6.23-rc6-mm1/kernel/power/disk.c
@@ -168,13 +168,14 @@ int create_image(int platform_mode)
}save_processor_state();
- error = swsusp_arch_suspend();
- if (error)
- printk(KERN_ERR "Error %d while creating the image\n", error);
+ //error = swsusp_arch_suspend();
+ //if (error)
+ // printk(KERN_ERR "Error %d while creating the image\n", error);
/* Restore control flow magically appears here */
restore_processor_state();
- if (!in_suspend)
- platform_leave(platform_mode);
+ //if (!in_suspend)
+ // platform_leave(platform_mode);
+ in_suspend = 0;
/* NOTE: device_power_up() is just a resume() for devices
* that suspended with irqs off ... no overall powerup.
*/
-
Rafael,
In meantime I figured out what's happening. The ordering in
hibernate_snapshot() is wrong. It does:swsusp_shrink_memory();
suspend_console();
device_suspend(PMSG_FREEZE);
platform_prepare(platform_mode);disable_nonboot_cpus();
swsusp_suspend();
enable_nonboot_cpus();
platform_finish(platform_mode);
device_resume();
resume_console();We disable everything in device_suspend() including timekeeping, so any
code which is depending on working timekeeping and timer functionality
(which is suspended in timekeeping_suspend() as well) is busted.enable_nonboot_cpus() definitely relies on working timekeeping and
timers depending on the codepath. It's just a surprise that this did not
blow up earlier (also before clock events).I changed the ordering of the above to:
disable_nonboot_cpus();
swsusp_shrink_memory();
suspend_console();
device_suspend(PMSG_FREEZE);
platform_prepare(platform_mode);
swsusp_suspend();
platform_finish(platform_mode);
device_resume();
resume_console();enable_nonboot_cpus();
and non-surprisingly the "my VAIO needs help from keyboard" problem went
away immediately. See patch below. (on top of rc7-hrt1, -mm1 does not
work at all on my VAIO due to some yet not identified wreckage)I did not yet look into the suspend to ram code, but I guess that there
is an equivalent problem.But I have no idea why this affects Andrews jinxed VAIO (UP machine),
though I suspect that we have more timekeeping/timer depending code
somewhere waiting to bite us.Also I still need to debug why the HIBERNATION_TEST code path (which has
a msleep(5000) in it) does not fail, but I postpone this until tomorrow
morning. I'm dead tired after hunting this Heisenbug which changes with
every other printk added to the code. I'm going to add some really noisy
messages for everything which accesses timekeeping / timers _after_
those systems have bee...
Hmm. This is close to the ordering we have in STR too.
I have some dim memory of there being some ACPI reason why it had to be
done that way.In fact, this was done in commit e3c7db621bed4afb8e231cb005057f2feb5db557,
long ago, by Rafael:As indicated in a recent thread on Linux-PM, it's necessary to call
pm_ops->finish() before devce_resume(), but enable_nonboot_cpus() has to be
called before pm_ops->finish() (cf.
http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html). For
consistency, it seems reasonable to call disable_nonboot_cpus() after
device_suspend().This way the suspend code will remain symmetrical with respect to the resume
code and it may allow us to speed up things in the future by suspending and
resuming devices and/or saving the suspend image in many threads.The following series of patches reorders the suspend and resume code so that
nonboot CPUs are disabled after devices have been suspended and enabled before
the devices are resumed. It also causes pm_ops->finish() to be called after
enable_nonboot_cpus() wherever necessary.Hmm?
It's entirely possible that that commit was simply just buggy, and we
should indeed move the CPU down/up to be early/late - we've fixed other
ordering issues since that commit went in. But this whole area is very
murky.(Btw, the above commit message points to just my response with a testing
patch to the real email: the actual explanation of the INSANE ordering is
from Len Brown inhttps://lists.linux-foundation.org/pipermail/linux-pm/2006-November/0041...
and there Len claims that we *must* wake up CPU's early).
I personally think that the whole ACPI ordering requirements are just
insane, but the point of this email is to point these different
requirements out, and hopefully we can get something that works for
everybody.Len added to Cc.
Len? Thomas wants to call 'disable_nonboot_cpus()' early, and
'enable_nonboot...
..and points to commit 1a38416cea8ac801ae8f261074721f35317613dc which in
turn talks about http://bugzilla.kernel.org/show_bug.cgi?id=5651Howerver, it seems that bugzilla entry may just be bogus. It talks about
"it appears that some firmware in the future may depend on that sequence
for correction operation"Len, Shaohua, what are the real issues here?
It would indeed be nice if we could just take CPU's down early (while
everything is working), and run the whole suspend code with just one CPU,
rather than having to worry about the ordering between CPU and device
takedown.That said, at least with STR, the situation is:
1) suspend_console
2) device_suspend(PMSG_SUSPEND) (== ->suspend)
3) disable_nonboot_cpus()
4) device_power_down(PMSG_SUSPEND) (== ->suspend_late)
5) pm_ops->enter()
6) device_power_up() (== ->resume_early)
7) enable_nonboot_cpus()
8) pm_finish()
9) device_resume() (== ->resume
10) resume_consoleSo if we agree that things like timers etc should *never* be suspended by
the early suspend, and *always* use "suspend_late/resume_early", then at
least STR should be ok.And I think that's a damn reasonable thing to agree on: timers (and
anything else that CPU shutdown/bringup could *possibly* care about)
should be considered core enough that they had better be on the
suspend_late/resume_early list.Thomas, Rafael, can you verify that at least STR is ok in this respect?
Linus
-
That is certainly what we want to do on powerpc.
Paul.
-
I would have expected that we do it exactly this way and it took me by
surprise, that we do not.tglx
-
Well, we used to do that, but acpi spec forbids that, and it means
userspace sees plugs/unplugs.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
Intel's reference BIOS for Core Duo performs some re-initialization
in _WAK that will get blow away if INIT follows _WAK.
IIR, it is related to re-initializing the thermal sensors.
I opened bug 5651 when the BIOS team informed me of this issue.Yes, bringing a processor offline and then online again w/o
an intervening suspend or reset would not evaluate _WAK,
and thus may still run into the issue.I don't know if this is a widespread issue and a commonly
used BIOS hook, or if it is specific to certain processors.-
If this is true, then we should disable the sys/..../cpu/online entry
right away.tglx
-
Or drop the execution of _INI from the CPU hotplug, if possible ...
Greetings,
Rafael
-
Linus,
-ETOOTIRED led me too a wrong conclusion, but still it is a valuable
hint that this change is making things work again. I need to go down
into the details of the swsusp_suspend() code path to figure out, what's
the root cause.Sorry for the noise, but I'm zooming in.
tglx
-
If you need any help from me with that, please let me know.
Greetings,
Rafael
-
Rafael,
I'm zooming in. It seems, that the ACPI idle code plays tricks with us.
After debugging the swsusp_suspend() code path I figured out, that we
end up in C2 or deeper power states while we run the suspend code. The
same happens when we come back on resume. It looks like we disable stuff
in the ACPI BIOS, which makes the C2 and deeper power states misbehave.
I hacked the idle loop arch code to use halt() right before we call
device_suspend() and switch back to the acpi idle code right after
device_resume(). This solves the problem as well.Len, any opinion on this one ?
tglx
-
Thomas,
Hm, can you please run the test I've suggested in another branch of the
thread, ie.# echo shutdown > /sys/power/disk
# echo disk > /sys/power/statewithout your debugging code in disk.c?
This makes the hibernation code omit the major ACPI hooks, so if it works,
Well, that seems less intrusive than changing the code ordering right before
the major kernel release, but I think we should do our best to understand what
_exactly_ is happening here.Greetings,
Rafael
-
Rafael,
Yes, this works fine. We still go into C3, but this seems not longer to
I found some other subtle thinko in the clock events code while I was
heading down the swsusp_suspend code path. I wait for confirmation that
it does not brick some endangered boxen, though. Still with this change
in the clock events code, my VAIO goes into C2 or C3 and causes the box
to wait for a helping keystroke.The correct solution would be, that the ACPI code ignores the lower
C-states during suspend / resume. I simply rmmod'ed the processor module
before suspend and the problem is solved as well. The cpuidle patches
make this problem more prominent due to the possible more direct switch
into lower power states, when we wait for a long time on something.I think we really should not fiddle with the various cpu states during
the critical parts of suspend / resume. Let's keep it simple. We have
the same policy during boot and I think the suspend / resume critical
parts have similar constraints.tglx
-
So, perhaps we can add a .suspend()/.resume() routines to the processor driver
and use them to disable/enable the cpuidle functionality during aI completely agree.
Greetings,
Rafael
-
Rafael,
http://tglx.de/private/tglx/p.diff
untested yet, but I'm on the way to do that :)
tglx
-
Thomas,
Heh, I thought of the same thing. :-)
Greetings,
Rafael
-
Yes. We're executing _INI from the CPU initialization code and that shouldn't
Sure.
Greetings,
Rafael
-
Rafael,
If I tear down CPU#1 right before I tell the kernel to hibernate, then
the box must explode in the same way. It does not. On none of 4 tested
laptops.Of course only the jinxed VAIO one exposes the "please press a key
problem".I need to follow down the swsusp_suspend() code path to figure out, why
this breaks the box.tglx
-
Thomas,
No, we don't. sysdevs are _not_ suspended in device_suspend().
They are suspended in device_power_down(), which is calledNo, the timekeeping is suspended in device_power_down() (or at least it should
Actually, we can't do this here, because of ACPI and some interrupt handling
related problems. Unfortunately, platform_finish() needs to go _after_
enable_nonboot_cpus() and device_resume() needs to go after platform_finish().
Analogously, disable_nonboot_cpus() has to go after platform_prepare().Hm, I really don't know why it helps, but that's not because of the timekeeping
Agreed.
Greetings,
Rafael
-
Rafael,
Well, I don't buy this one. The system would break in the same way, when
It is related. We rely on some subtle thing which is not up when we
Yes. It makes sense. When I change the TEST code path to:
- printk("swsusp debug: Waiting for 5 seconds.\n");
- msleep(5000);
+ printk("swsusp debug: before swsusp_suspend\n");
+ error = swsusp_suspend();then I have the same effect as I get from real hibernation. And we
actually shut down time keeping somewhere in that code path.ACPI: PCI interrupt for device 0000:00:1b.0 disabled
swsusp debug: before swsusp_suspend
Suspend timekeeping
swsusp: critical section:
swsusp: Need to copy 112429 pages
swsusp: Normal pages needed: 35399 + 1024 + 40, available pages: 193876
swsusp: critical section: done (112429 pages copied)
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Resume timekeeping
ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 16
-> works fineThis is with my patch applied. Without that I get:
CPU1 is down
swsusp debug: before swsusp_suspend
Suspend timekeeping
swsusp: critical section:
swsusp: Need to copy 112429 pages
swsusp: Normal pages needed: 35399 + 1024 + 40, available pages: 193876
swsusp: critical section: done (112429 pages copied)
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Resume timekeeping
Enabling non-boot CPUs
--> Waits for ever until a key is pressedThanks,
tglx
-
Thomas,
Can you please run one more test?
Namely, without your debugging code in disk.c, please try
# echo shutdown > /sys/power/disk
# echo disk > /sys/power/stateGreetings,
Rafael
-
Thomas,
I was referring to the resume part. If we call enable_nonboot_cpus(), which
executes the _INI ACPI control method, after platform_finish(), which executes
the _WAK global ACPI control method, things will break. That already happenedExactly. timekeeping_suspend() is called from device_power_down(), which is
Well, perhaps there's something else that we should suspend late and resume
early, but we don't?Greetings,
Rafael
-
% uname -a
Linux turing-police.cc.vt.edu 2.6.23-rc6-mm1 #1 SMP PREEMPT Tue Sep 18 12:32:13 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
% uptime
15:11:48 up 36 min, 1 user, load average: 0.05, 0.11, 0.09Had a few issues, all self-inflicted (the biggie was 2 iptables modules that
needed fixing for the net namespace changes in git-net.patch - stuck an
&init_net in the right 2 places, a new #include line, and all was good).I *did* have an odd issue with an out-of-tree GPL driver for a webcam.
(It's from http://mxhaard.free.fr/ if anybody cares).The Makefile does this:
make -C /lib/modules/`uname -r`/build SUBDIRS=/home/valdis/src/gspcav1-20070508 CC=cc modules
and has this sort of stuff in it:
DEFINES += -DGSPCA_ENABLE_DEBUG
DEFINES += -DCONFIG_USB_GSPCA_MODULE=1 -DMODULE -D__KERNEL__
DEFINES += -DVID_HARDWARE_GSPCA=0xFF -DGSPCA_VERSION=\"$(VERSION)\"In -rc4-mm1, $DEFINES got added to the compile command - in -rc6-mm1, I
had to namually add a 'CFLAGS += $(DEFINES)' or GSPCA_VERSION came up undefined.The actual problematic code in the Makefile:
-- begin Makefile snippet
ifneq ($(KERNELRELEASE),) # We were called by kbuild
CFLAGS += $(DEFINES)
obj-m += gspca.o
gspca-objs := gspca_core.o decoder/gspcadecoder.oelse # We were called from command line
KERNEL_VERSION = `uname -r`
KERNELDIR := /lib/modules/$(KERNEL_VERSION)/build
PWD := $(shell pwd)
MODULE_INSTALLDIR = /lib/modules/$(KERNEL_VERSION)/kernel/drivers/usb/media/
MODULE_INSTALLDIR2 = /lib/modules/$(KERNEL_VERSION)/kernel/drivers/media/video/default:
$(MAKE) -C $(KERNELDIR) SUBDIRS=$(PWD) CC=$(CC) modulesinstall:
mkdir -p $(MODULE_INSTALLDIR)
rm -f $(MODULE_INSTALLDIR)spca5xx.ko
rm -f $(MODULE_INSTALLDIR2)gspca.ko
install -c -m 0644 gspca.ko $(MODULE_INSTALLDIR)
/sbin/depmod -aeuninstall:
rm -f $(MODULE_INSTALLDIR)gspca.ko
/sbin/depmod -aqendif
--- end Makefile snippetThe Make definitely falls into the...
So the external module were fiddeling with CFLAGS which is wrong.
Yes - it worked before by accident.From Documentation/kbuild/makefiles.txt:
=====================================================================
--- 3.7 Compilation flagsEXTRA_CFLAGS, EXTRA_AFLAGS, EXTRA_LDFLAGS, EXTRA_ARFLAGS
All the EXTRA_ variables apply only to the kbuild makefile
where they are assigned. The EXTRA_ variables apply to all
commands executed in the kbuild makefile.$(EXTRA_CFLAGS) specifies options for compiling C files with
$(CC).Example:
# drivers/sound/emu10k1/Makefile
EXTRA_CFLAGS += -I$(obj)
ifdef DEBUG
EXTRA_CFLAGS += -DEMU10K1_DEBUG
endifThis variable is necessary because the top Makefile owns the
variable $(KBUILD_CFLAGS) and uses it for compilation flags for the
entire tree.
=====================================================================Nowhere are the use of CFLAGS documented.
The use of EXTRA_CFLAGS has been working as far back as I remember so
there is no backward compatibility issues to my best knowledge.Pelase ask the author of the module to either fix it or even better
submit the driver for inclusion because then we will find it during
the review pahse.But anyway - thanks for reporting an issue related to an external module.
Sam
-
I had a few issues with some other modules, but they're evil binaries so
I'm taking those up directly with the companies involved.. ;)
How annoying is this - and is this the CFLAGS thing again?
We could introduce some workaraound so we continue to respect
the CFLAGS settings in a Makefile for a while.
But at the end of the day we should convince the external module people
to follow the Kbuild docs. So it will then be temporary.Opinion?
Sam
-
The other issues were VMWare and the removal of EXPORT_SYMBOL(set_dumpable),
and NVidia graphics driver not playing nice with x86_64-mm-cpa-clflush.patch
(these actually broke back around -rc[34]-mm1, so it's not a new issue). I
just had to spend 5 minutes re-checking that my workarounds did/didn't needI already sent the author a patch for the broken Makefile. I don't think an
in-tree workaround is the right thing here. I've vote for an entry in the
"What's new in 2.6.2X" document when the kbuild changes go mainline, since
fixing it to use EXTRA_CFLAGS worked perfectly. External module maintainers
have (presumably) already read stable-API-nonsense, and should be used to
fixing code for new releases, so a 1 or 2 line Makefile tweak shouldn't be a
hardship.
Very good suggestion. I will try to make it happen when this hit mainline.
No but then on the other hand I do not want to make life too dificult for
the (non-binary) external modules.
I will keep an eye out for additional CFLAGS mis-use and if I get no more than
a few report I will no do the backward compatibility thing.Sam
-
Hi Andrew,
IA64 failed to build in kgdb with the messages
arch/ia64/kernel/kgdb.c: In function 'kgdb_arch_set_breakpoint':
arch/ia64/kernel/kgdb.c:555: warning: large integer implicitly truncated to unsigned type
arch/ia64/kernel/kgdb.c: At top level:
arch/ia64/kernel/kgdb.c:673: error: static declaration of 'kgdb_hwbreak_sstep' follows non-static declaration
include/asm/kgdb.h:33: error: previous declaration of 'kgdb_hwbreak_sstep' was here
make[1]: *** [arch/ia64/kernel/kgdb.o] Error 1
make: *** [arch/ia64/kernel] Error 2
make: *** Waiting for unfinished jobs....It's buried in git-kgdb.patch and I've cc'd who might be the owner. A candidate
fix is below. The build error seems to be a simple case of a badly placed
static. The warning about large integer truncating does not have an obvious
fix and I didn't want to hide the warning with a real bug.Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
kgdb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc6-mm1-010_fix_ehea_main/arch/ia64/kernel/kgdb.c linux-2.6.23-rc6-mm1-015_fix_ia64_kgdb/arch/ia64/kernel/kgdb.c
--- linux-2.6.23-rc6-mm1-010_fix_ehea_main/arch/ia64/kernel/kgdb.c 2007-09-18 11:29:26.000000000 +0100
+++ linux-2.6.23-rc6-mm1-015_fix_ia64_kgdb/arch/ia64/kernel/kgdb.c 2007-09-18 17:33:36.000000000 +0100
@@ -670,7 +670,7 @@ void kgdb_roundup_cpus(unsigned long fla
smp_send_nmi_allbutself();
}-static volatile int kgdb_hwbreak_sstep[NR_CPUS];
+volatile int kgdb_hwbreak_sstep[NR_CPUS];static int kgdb_notify(struct notifier_block *self, unsigned long cmd,
void *ptr)
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
That looks fishy to me. Why is it volatile-qualified? And does that
actually want to be a per-cpu var?
-
It turned out that it was unnecessary. A follow-up patch removed the volatile
--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
ppc64 kgdb support is also broken but in a much more fundamental manner.
allmodconfig shows upIn file included from include/linux/kgdb.h:22,
from arch/powerpc/kernel/legacy_serial.c:15:
include/asm/kgdb.h:34: error: 'debugger' redeclared as different kind of symbol
include/asm/system.h:85: error: previous definition of 'debugger' was here
include/asm/kgdb.h:35: error: 'debugger_bpt' redeclared as different kind of symbol
include/asm/system.h:87: error: previous definition of 'debugger_bpt' was here
include/asm/kgdb.h:36: error: 'debugger_sstep' redeclared as different kind of symbol
include/asm/system.h:88: error: previous definition of 'debugger_sstep' was here
include/asm/kgdb.h:37: error: 'debugger_iabr_match' redeclared as different kind of symbol
include/asm/system.h:89: error: previous definition of 'debugger_iabr_match' was here
include/asm/kgdb.h:38: error: 'debugger_dabr_match' redeclared as different kind of symbol
include/asm/system.h:90: error: previous definition of 'debugger_dabr_match' was here
include/asm/kgdb.h:39: error: 'debugger_fault_handler' redeclared as different kind of symbol
include/asm/system.h:91: error: previous definition of 'debugger_fault_handler' was here
In file included from arch/powerpc/kernel/legacy_serial.c:15:
include/linux/kgdb.h:69: error: `BREAK_INSTR_SIZE' undeclared here (not in a function)
make[1]: *** [arch/powerpc/kernel/legacy_serial.o] Error 1
make: *** [arch/powerpc/kernel] Error 2I cleared the redeclarations up and put in some defines but there is
much more fundamental breakage likekernel/kgdb.c:122: error: `BUFMAX' undeclared here (not in a function)
kernel/kgdb.c:128: error: `NUMCRITREGBYTES' undeclared here (not in a function)
kernel/kgdb.c: In function `write_mem_msg':
kernel/kgdb.c:554: error: `CACHE_FLUSH_IS_SAFE' undeclared (first use in this function)
kernel/kgdb.c:554: error: (Each undeclared identifier is reported only once
kernel/kgdb.c:554: error: for each function it appears in.)
kernel/kgdb.c:...
Still complains with:
drivers/watchdog/core/watchdog_dev.c:84: warning: format '%i' expects type 'int', but argument 5 has type 'size_t'
which Satyam posted a patch for:
http://lkml.org/lkml/2007/9/3/212
and Wim replied that he had applied it... ???
http://lkml.org/lkml/2007/9/4/145 (maybe to wrong tree/branch ?)---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
Please ignore. wrong directory :(
---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
LD .tmp_vmlinux1
drivers/built-in.o: In function `acpi_ac_remove':
ac.c:(.text+0x3eeec): undefined reference to `power_supply_unregister'
drivers/built-in.o: In function `acpi_ac_add':
ac.c:(.text+0x3f10b): undefined reference to `power_supply_register'
make: *** [.tmp_vmlinux1] Error 1#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-rc6-mm1
# Tue Sep 18 12:42:42 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
CONFIG_USER_NS=y
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_TREE=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
# CONFIG_CONTAINERS is not set
CONFIG_SYSFS_DEPRECATED=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFI...
Hi Miles,
The problem appears to be that ACPI_AC is compiled in but POWER_SUPPLY is
a module. You could workaround it by making them both modules but I believe
the root cause may be a dependency problem. I've included a patch below that
when you apply and run make oldconfig should allow you to build your kernel.ACPI folks, is this a fix? If so, there may be other dependency issues that
allow a compiled-in feature to depend on a module.Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc6-mm1-015_fix_ia64_kgdb/drivers/acpi/Kconfig linux-2.6.23-rc6-mm1-020_fix_acpi/drivers/acpi/Kconfig
--- linux-2.6.23-rc6-mm1-015_fix_ia64_kgdb/drivers/acpi/Kconfig 2007-09-11 03:50:29.000000000 +0100
+++ linux-2.6.23-rc6-mm1-020_fix_acpi/drivers/acpi/Kconfig 2007-09-18 19:03:32.000000000 +0100
@@ -88,7 +88,7 @@ config ACPI_PROC_EVENTconfig ACPI_AC
tristate "AC Adapter"
- depends on X86
+ depends on X86 && POWER_SUPPLY
default y
help
This driver adds support for the AC Adapter object, which indicates
-
Hi Andrew,
PPC64 failed to build with the driver drivers/net/ehea with the
following errorCC [M] drivers/net/ehea/ehea_main.o
drivers/net/ehea/ehea_main.c: In function `ehea_open':
drivers/net/ehea/ehea_main.c:2322: error: implicit declaration of function `port_napi_enable'
drivers/net/ehea/ehea_main.c: At top level:
drivers/net/ehea/ehea_main.c:2340: error: conflicting types for 'port_napi_enable'
drivers/net/ehea/ehea_main.c:2322: error: previous implicit declaration of 'port_napi_enable' was here
make[1]: *** [drivers/net/ehea/ehea_main.o] Error 1It's buried in git-net.patch and I've cc'd a potential owner based simply
on the mention of EHEA in a leader. I've included a candidate fix for the
error. It seems to be a simple case of port_napi_enable being defined in
the wrong place.Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
ehea_main.c | 32 ++++++++++++++++----------------
1 file changed, 16 insertions(+), 16 deletions(-)diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc6-mm1-005_fix-rpadlpar_sysfs/drivers/net/ehea/ehea_main.c linux-2.6.23-rc6-mm1-010_fix_ehea_main/drivers/net/ehea/ehea_main.c
--- linux-2.6.23-rc6-mm1-005_fix-rpadlpar_sysfs/drivers/net/ehea/ehea_main.c 2007-09-18 11:29:28.000000000 +0100
+++ linux-2.6.23-rc6-mm1-010_fix_ehea_main/drivers/net/ehea/ehea_main.c 2007-09-18 17:15:27.000000000 +0100
@@ -2307,6 +2307,22 @@ out:
return ret;
}+static void port_napi_disable(struct ehea_port *port)
+{
+ int i;
+
+ for (i = 0; i < port->num_def_qps; i++)
+ napi_disable(&port->port_res[i].napi);
+}
+
+static void port_napi_enable(struct ehea_port *port)
+{
+ int i;
+
+ for (i = 0; i < port->num_def_qps; i++)
+ napi_enable(&port->port_res[i].napi);
+}
+
static int ehea_open(struct net_device *dev)
{
int ret;
@@ -2328,22 +2344,6 @@ static int ehea_open(struct net_device *
return ret;
}-static void port_napi_disable(struct ehea_port *port)
-{
- int i;
-
- for (i = 0; i <...
From: mel@skynet.ie (Mel Gorman)
Thanks for this fix, I'll merge it into the net-2.6.24 tree.
-
My apologies for the repost, this should have gone to netdev and Dave Miller
as well.--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
Hi,
I get modpost errors here :
...
ERROR: "dvb_dmx_init" [drivers/media/video/video-buf-dvb.ko] undefined!
ERROR: "dvb_unregister_adapter" [drivers/media/video/video-buf-dvb.ko] undefined!
ERROR: "dvb_register_frontend" [drivers/media/video/video-buf-dvb.ko] undefined!
ERROR: "dvb_unregister_frontend" [drivers/media/video/video-buf-dvb.ko] undefined!
ERROR: "dvb_net_release" [drivers/media/video/video-buf-dvb.ko] undefined!
ERROR: "dvb_frontend_detach" [drivers/media/video/video-buf-dvb.ko] undefined!
ERROR: "dvb_dmxdev_release" [drivers/media/video/video-buf-dvb.ko] undefined!
ERROR: "dvb_dmx_swfilter" [drivers/media/video/video-buf-dvb.ko] undefined!
ERROR: "dvb_net_init" [drivers/media/video/video-buf-dvb.ko] undefined!
ERROR: "dvb_dmx_release" [drivers/media/video/video-buf-dvb.ko] undefined!
ERROR: "dvb_register_adapter" [drivers/media/video/video-buf-dvb.ko] undefined!
ERROR: "dvb_dmxdev_init" [drivers/media/video/video-buf-dvb.ko] undefined!
ERROR: "mt2131_attach" [drivers/media/video/cx23885/cx23885.ko] undefined!
ERROR: "s5h1409_attach" [drivers/media/video/cx23885/cx23885.ko] undefined!
CC arch/i386/boot/mca.o
CC arch/i386/boot/memory.o
CC arch/i386/boot/pm.o
make[1]: *** [__modpost] Fehler 1
make: *** [modules] Fehler 2...
config there : http://194.231.229.228/MM/2.6.23-rc6-mm1/config-modpost-errors
Regards,
Gabriel
-
Please cc: relevant people.
Was not sure who maintain this stuff..
It's not in the git-tree I had around.-
Sam,
This stuff is in -mm only right now.
It was a dependency problem, where a new driver (cx23885) is missing the
dependency on DVB_CORE.The fix is in this changeset:
http://linuxtv.org/hg/~mkrufky/build-fix/rev/67acaa106a99
Mauro,
Please pull from:
http://linuxtv.org/hg/~mkrufky/build-fix
for:
- VIDEO_CX23885 depends on DVB_CORE
Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)Regards,
Mike Krufky
-
Applied at my -git tree.
Cheers,
Mauro.-
Selecting Help for "Subarchitecture Type" causes "make menuconfig" to
crash, and the bash display settings have to be reset.Miles
-
Hi Miles.
Not reproduceable here.
But I noticed that we pass a null pointer to a vsprintf function which
in the cases you pointed out printed a (null) at my system.
Could you plase try if attached patch fix your system.Thanks,
Samdiff --git a/scripts/kconfig/mconf.c b/scripts/kconfig/mconf.c
index 2ee12a7..1935818 100644
--- a/scripts/kconfig/mconf.c
+++ b/scripts/kconfig/mconf.c
@@ -357,8 +357,9 @@ static void get_symbol_str(struct gstr *r, struct symbol *sym)
bool hit;
struct property *prop;- str_printf(r, "Symbol: %s [=%s]\n", sym->name,
- sym_get_string_value(sym));
+ if (sym && sym->name)
+ str_printf(r, "Symbol: %s [=%s]\n", sym->name,
+ sym_get_string_value(sym));
for_all_prompts(sym, prop)
get_prompt_str(r, prop);
hit = false;
-
Sorry, it still crashes. I am running Ubuntu pre-6.10 (Gutsy -- the
development version of the distro). Maybe I should try "make
mrproper" first?ncurses 5.6+20070716-1ubuntu1
bash 3.2-0ubuntu9
GNU Make 3.81
-
make mrproper should not do any difference here.
I rather think you hit some ncurses bug.If you could add '-g' to HOSTCFLAGS in top-level Makefile
and then do:
rm scripts/kconfig/mconf.o scripts/kconfig/mconf
make menuconfig(to build mconf and to check that the error is still reproduceable).
And then run it in a debugger like this:
gdb scripts/kconfig/mconf
run arch/x86_64/Kconfig
^^^^^^ replace with your actual archProvoke the error and get a back-trace with 'bt'.
Thanks,
Sam
-
Hi Sam,
I can reproduce this bug on Frugalware Linux.
Here the bt:
Program received signal SIGSEGV, Segmentation fault.
0xb7dc4143 in strlen () from /lib/libc.so.6
(gdb) bt
#0 0xb7dc4143 in strlen () from /lib/libc.so.6
#1 0x0804fd60 in str_append (gs=0xbfe4f6e8, s=0x0) at scripts/kconfig/util.c:87
#2 0x0804e0cb in expr_print (e=0x8e22df8, fn=0x804fda0 <expr_print_gstr_helper>, data=0xbfe4f6e8, prevtoken=0) at scripts/kconfig/expr.c:1037
#3 0x0804e1e7 in expr_gstr_print (e=0x8e22df8, gs=0xbfe4f6e8) at scripts/kconfig/expr.c:1099
#4 0x0804a07e in get_symbol_str (r=0xbfe4f6e8, sym=0x8b54ee8) at scripts/kconfig/mconf.c:334
#5 0x0804a363 in show_help (menu=0x8b54f88) at scripts/kconfig/mconf.c:738
#6 0x0804acec in conf (menu=0x8b69480) at scripts/kconfig/mconf.c:781
#7 0x0804a971 in conf (menu=0x8063c40) at scripts/kconfig/mconf.c:703
#8 0x0804af8a in main (ac=Cannot access memory at addre...
So the string is null and we call strlen(null) => boom.
Thanks - will post a fix in a few minutes.
Need to investigate a bit more why I do not see the crash even
though I updated to ubuntu gutsy to check it out?!?Sam
-
The crash is still there but the (null)'s are all fixed by this patch.
Gabriel
-
Got it. We sometimes get a numm pointer when printing the expression (in expr.c).
I have queued following fix.Thanks for reporting!
Sam
Patch is copy'n'pased due to temporary setup troubles after upgradign to Gutsy.
Will be at kbuil.git in a few minutes.From 69d39ec036b4fca541efc3c9ee31ec65d6b95bd4 Mon Sep 17 00:00:00 2001
From: Sam Ravnborg <sam@ravnborg.org>
Date: Wed, 19 Sep 2007 21:23:09 +0200
Subject: [PATCH] kconfig: fix segv fault in menuconfigWith specific configurations requesting help for certain
menu lines caused menuconfig to crash.
This was tracked down to a null pointer bug.
Thanks to "Miles Lane" <miles.lane@gmail.com> for inital reporting
and to Gabriel C <nix.or.die@googlemail.com> for the backtrace
that helped me locating the bug.Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
---
scripts/kconfig/mconf.c | 5 +++--
scripts/kconfig/util.c | 13 ++++++++-----
2 files changed, 11 insertions(+), 7 deletions(-)diff --git a/scripts/kconfig/mconf.c b/scripts/kconfig/mconf.c
index 2ee12a7..1935818 100644
--- a/scripts/kconfig/mconf.c
+++ b/scripts/kconfig/mconf.c
@@ -357,8 +357,9 @@ static void get_symbol_str(struct gstr *r, struct symbol *sym)
bool hit;
struct property *prop;- str_printf(r, "Symbol: %s [=%s]\n", sym->name,
- sym_get_string_value(sym));
+ if (sym && sym->name)
+ str_printf(r, "Symbol: %s [=%s]\n", sym->name,
+ sym_get_string_value(sym));
for_all_prompts(sym, prop)
get_prompt_str(r, prop);
hit = false;
diff --git a/scripts/kconfig/util.c b/scripts/kconfig/util.c
index e3f28b9..e1cad92 100644
--- a/scripts/kconfig/util.c
+++ b/scripts/kconfig/util.c
@@ -84,12 +84,15 @@ void str_free(struct gstr *gs)
/* Append to growable string */
void str_append(struct gstr *gs, const char *s)
{
- size_t l = strlen(gs->...
-
Thanks for testing.
Sam
-
Please cc: relevant people next time.
I will take a look.Thanks,
Sam
-
There seem to be a lot of config option help screens that are crashing
"make menuconfig".
I poked around a little and found a couple more:
Under CPU Scaling Frequency, "'userspace' governer for userspace
frequency scaling" help causes a crash.
Under IO Schedulers, if you have:
<M> Anticipatory I/O scheduler
<M> Deadline I/O scheduler
<*> CFQ I/O scheduler
Default I/O scheduler (CFQ) --->
and then select "Default I/O Scheduler (CFQ)" and then select help,
you'll get a crash.
I got another crash when I selected help for "Symmetric
multi-processing support".It seems likely that there are loads more.
-
So I do a 'make silentoldconfig', and I get:
*
* Generic Driver Options
*
path to uevent helper (UEVENT_HELPER_PATH) [/sbin/hotplug] (NEW)(Hey wow - the patch has a 'help' section, but no '?' listed because it's not a
bool or tristate. You can apparently get help during 'make menuconfig' but not
'make *oldconfig'. That rocks. ;)Yeeh. Hah. I don't *have* a /sbin/hotplug on this udev-based Fedora box. But
wait, reading the patch says that's usually during early boot. Now to figure
out what /sbin/nash does with its onboard 'hotplug' command... ;)(In other words, this *really* needs some better help documentation, to explain
who needs this, and who doesn't need it, and what happens if you select it
and you shouldn't have, and what happens if you leave it out when you should
have set it....)
Please cc: relevant people next time you complain about kconfig.
Thanks,
Sam
-
<snip>
Hi Andrew,
The 2.6.23-rc6-mm1build fails at
CC drivers/pci/hotplug/rpadlpar_core.o
CC drivers/pci/hotplug/rpadlpar_sysfs.o
drivers/pci/hotplug/rpadlpar_sysfs.c:132: error: unknown field `name' specified in initializer
drivers/pci/hotplug/rpadlpar_sysfs.c: In function `dlpar_sysfs_init':
drivers/pci/hotplug/rpadlpar_sysfs.c:142: error: structure has no member named `name'
make[3]: *** [drivers/pci/hotplug/rpadlpar_sysfs.o] Error 1
make[2]: *** [drivers/pci/hotplug] Error 2
make[1]: *** [drivers/pci] Error 2
make: *** [drivers] Error 2--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.-
This seems to be occuring across a number of the powerpc systems we test
with. That driver is a power dynamic lpar IO partitioning driver.Relevant Cc: added.
-apw
-
That's because somebody is breaking sysfs/kobject interfaces without
fixing all users :-) (Fair enough... it's just that we need to make sure
whoever takes care of that driver nowadays is aware of the breakage).Ben.
-
Hi Andrew,
Using the kobject_set_name function to set the kobject k_name.
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
---
--- linux-2.6.23-rc6/drivers/pci/hotplug/rpadlpar_sysfs.c
2007-09-18 14:56:05.000000000 +0530
+++ linux-2.6.23-rc6/drivers/pci/hotplug/~rpadlpar_sysfs.c
2007-09-18 16:51:55.000000000 +0530
@@ -129,17 +129,17 @@ struct kobj_type ktype_dlpar_io = {
};struct kset dlpar_io_kset = {
- .kobj = {.name = DLPAR_KOBJ_NAME,
- .ktype = &ktype_dlpar_io,
- .parent = &pci_hotplug_slots_subsys.kobj},
+ .kobj = {.ktype = &ktype_dlpar_io,
+ .parent = &pci_hotplug_slots_subsys.kobj},
.ktype = &ktype_dlpar_io,
};int dlpar_sysfs_init(void)
{
+ kobject_set_name(&dlpar_io_kset.kobj, DLPAR_KOBJ_NAME);
if (kset_register(&dlpar_io_kset)) {
printk(KERN_ERR "rpadlpar_io: cannot register kset for
%s\n",
- dlpar_io_kset.kobj.name);
+ dlpar_io_kset.kobj.k_name);
return -EINVAL;
}--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.-
Close, you should also use the kobject_name() function when referencing
it.I'll fix this up in the patch, thanks.
Oh, and sorry for breaking this, I could only test all of the modules
build on x86 due to traveling while creating this patch. I need to set
up some cross-compilers on my laptop to fix this issue :(thanks,
greg k-h
-
Yuck :-) Oh well, I hope you have a _FAST_ laptop :-)
Cheers,
Ben.-
Thanks.
Your email client replaces tabs with spaces, and is performing wordwrapping.
-
Probably this:
--- a/drivers/pci/hotplug/rpadlpar_sysfs.c~a
+++ a/drivers/pci/hotplug/rpadlpar_sysfs.c
@@ -129,8 +129,7 @@ struct kobj_type ktype_dlpar_io = {
};struct kset dlpar_io_kset = {
- .kobj = {.name = DLPAR_KOBJ_NAME,
- .ktype = &ktype_dlpar_io,
+ .kobj = {.ktype = &ktype_dlpar_io,
.parent = &pci_hotplug_slots_subsys.kobj},
.ktype = &ktype_dlpar_io,
};
_But I'm fed up with fixing that patch. It's Greg's turn.
-
But wait ... isn't that a statically-allocated kobject, which were
supposed to be "naughty" in the first place?
-
Yes it is, if you want to dynamically create it, please do. But there's
a lot of these in the kernel and I didn't want to break _all_ of them at
once, it's a multiple-stage set of patches to get us there (this name
array change is just the first in the set of these steps...)thanks,
greg k-h
-
Hi Greg,
Sorry for being late to reply, but do you still want such a patch (i.e.
convert static to dynamic allocation)?I read elsewhere on this thread that you'd merge Kamalesh's patch and
fix it up to also use kobject_name() yourself. But it's a small/trivial
driver, so I think just converting it to dynamic allocation right now
itself (when we've noticed it already) is probably better (?)[ BTW I don't see the fix in your git trees or quilt queue. So I'll
make a patch on top of 2.6.23-rc6-mm1 itself. ]Thanks,
Satyam
-
I'm behind in updating my patch queue, sorry, other things came up :(
thanks,
greg k-h
-
Make that "suspend-to-disk".
-
| David Miller | Re: Slow DOWN, please!!! |
| Greg Kroah-Hartman | [PATCH 013/196] Documentation: Replace obsolete "driverfs" with "sysfs". |
| James Bottomley | Re: Integration of SCST in the mainstream Linux kernel |
| Jeff Garzik | Re: [RFC] Heads up on sys_fallocate() |
git: | |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Linus Torvalds | Re: [GIT]: Networking |
| Andrew Morton | Re: [BUG] New Kernel Bugs |
