ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc3/2.6.23-rc3-mm1/ - git-ixgbe.patch got dropped - git-net.patch destroyed it - then git-net got dropped as it doesn't work - the -mm import-to-git engine still isn't working Boilerplate: - See the `hot-fixes' directory for any important updates to this patchset. - To fetch an -mm tree using git, use (for example) git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1 git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1 - -mm kernel commit activity can be reviewed by subscribing to the mm-commits mailing list. echo "subscribe mm-commits" | mail majordomo@vger.kernel.org - If you hit a bug in -mm and it is not obvious which patch caused it, it is most valuable if you can perform a bisection search to identify which patch introduced the bug. Instructions for this process are at http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt But beware that this process takes some time (around ten rebuilds and reboots), so consider reporting the bug first and if we cannot immediately identify the faulty patch, then perform the bisection search. - When reporting bugs, please try to Cc: the relevant maintainer and mailing list on any email. - When reporting bugs in this kernel via email, please also rewrite the email Subject: in some manner to reflect the nature of the bug. Some developers filter by Subject: when looking for messages to read. - Occasional snapshots of the -mm lineup are uploaded to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on the mm-commits list. Changes since 2.6.23-rc2-mm2: origin.patch git-acpi.patch git-alsa.patch git-agpgart.patch git-audit-master.patch git-cpufreq.patch git-powerpc.patch git-dma.patch git-drm.patch git-dvb.patch git-hwmon.patch git-gfs2-nmw.patch git-hid.patch git-ieee1394.patch ...
allyesconfig on x86_64 says: kernel/unwind.c:1016:31: error: undefined identifier '__builtin_labs' kernel/unwind.c:1232:25: error: undefined identifier '__builtin_labs' --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -
On Wed, 22 Aug 2007 11:03:48 -0700 One wonders why x86_64-mm-unwinder.patch has an open-coded call to __builtin_labs(), when include/linux/kernel.h:abs() should do a fine job. And what's this stuff, anyway? +typedef unsigned long uleb128_t; +typedef signed long sleb128_t; +#define sleb128abs __builtin_labs unsigned and signed little-endian 128-bit types? Nope, they're 32-bit or 64-bit. All very mysterious. -
dwarf2 uses a magic compressing encoding for numbers that uses less bytes for small numbers and more bytes for larger numbers. These are the base types for this. It's similar to fs/reiser4/dscale.h in your tree. -Andi -
Hmm I use the same compiler from SUSE10.2 and it works for me (with both
mm and only my tree applied)
Ok mm fails with some errors in the wireless drivers but with
wireless disabled it compiles.
When you compile a simple test program like
main() { printf("%lu\n", __builtin_labs(-1)); }
Andrew, I actually checked that and the abs() there is just abs()
not a labs(). So it wouldn't work on 64bit platform.
We could opencode it of course, but __builtin_labs should be really
there.
-Andi
-
Apparently, the b43 driver is expecting another version of mac80211.
This patch fixes the compilation, but I'm not sure what about the
functionality. ;-)
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
drivers/net/wireless/b43/main.c | 6 ++----
drivers/net/wireless/b43/xmit.c | 10 ++++------
2 files changed, 6 insertions(+), 10 deletions(-)
Index: linux-2.6.23-rc3-mm1/drivers/net/wireless/b43/main.c
===================================================================
--- linux-2.6.23-rc3-mm1.orig/drivers/net/wireless/b43/main.c
+++ linux-2.6.23-rc3-mm1/drivers/net/wireless/b43/main.c
@@ -1189,8 +1189,7 @@ static void b43_write_probe_resp_plcp(st
plcp.data = 0;
b43_generate_plcp_hdr(&plcp, size + FCS_LEN, rate);
- dur = ieee80211_generic_frame_duration(dev->wl->hw,
- dev->wl->if_id, size,
+ dur = ieee80211_generic_frame_duration(dev->wl->hw, size,
B43_RATE_TO_BASE100KBPS(rate));
/* Write PLCP in two parts and timing for packet transfer */
tmp = le32_to_cpu(plcp.data);
@@ -1246,8 +1245,7 @@ static u8 *b43_generate_probe_resp(struc
/* Set the frame control. */
hdr->frame_control = cpu_to_le16(IEEE80211_FTYPE_MGMT |
IEEE80211_STYPE_PROBE_RESP);
- dur = ieee80211_generic_frame_duration(dev->wl->hw,
- dev->wl->if_id, *dest_size,
+ dur = ieee80211_generic_frame_duration(dev->wl->hw, *dest_size,
B43_RATE_TO_BASE100KBPS(rate));
hdr->duration_id = dur;
Index: linux-2.6.23-rc3-mm1/drivers/net/wireless/b43/xmit.c
===================================================================
--- linux-2.6.23-rc3-mm1.orig/drivers/net/wireless/b43/xmit.c
+++ linux-2.6.23-rc3-mm1/drivers/net/wireless/b43/xmit.c
@@ -220,7 +220,6 @@ static void generate_txhdr_fw4(struct b4
} else {
int fbrate_base100kbps = B43_RATE_TO_BASE100KBPS(rate_fb);
txhdr->dur_fb = ieee80211_generic_frame_duration(dev->wl->hw,
- dev->wl->if_id,
fragment_len,
fbrate_base100kbps);
}
@@ -311,16 ...There seems to be a screwup somehow. These mac80211 API functions were recently changed to include the additional parameter. So it seems you carry an old version of mac80211. -- Greetings Michael. -
I think what happened is because Andrew dropped Dave M.'s net tree. Since mac80211 has been getting merged through Dave M., crucial bits are missing which then break the bits from wireless-dev. Andrew, if you find that you need to drop git-net again then I'll be happy to provide you with a wireless-dev patch that does not depend on Dave's tree. The mm-master branch in wireless-dev has dropped those patches which have gone to Dave M. in the hopes of avoiding conflicts. Dependencies are another matter... :-) John -- John W. Linville linville@tuxdriver.com -
Hopefully git-net is less wrecked than it was yesterday. If things still play up I'll have a go at bodging it up a bit, perhaps by disabling netconsole. (Although now I think about it, the netconsole bug was mainly an ill-advised BUG_ON, fixable by using WARN_ON instead). -
Hello,
Got that on my laptop:
------------------------
| Locking API testsuite:
----------------------------------------------------------------------------
| spin |wlock |rlock |mutex | wsem | rsem |
--------------------------------------------------------------------------
A-A deadlock: ok | ok | ok | ok | ok | ok |
A-B-B-A deadlock: ok | ok | ok | ok | ok | ok |
A-B-B-C-C-A deadlock: ok | ok | ok | ok | ok | ok |
A-B-C-A-B-C deadlock: ok | ok | ok | ok | ok | ok |
A-B-B-C-C-D-D-A deadlock: ok | ok | ok | ok | ok | ok |
A-B-C-D-B-D-D-A deadlock: ok | ok | ok | ok | ok | ok |
A-B-C-D-B-C-D-A deadlock: ok | ok | ok | ok | ok | ok |
double unlock: ok | ok | ok | ok | ok | ok |
initialize held: ok | ok | ok | ok | ok | ok |
bad unlock order: ok | ok | ok | ok | ok | ok |
--------------------------------------------------------------------------
recursive read-lock: | ok | | ok |
recursive read-lock #2: | ok | | ok |
mixed read-write-lock: | ok | | ok |
mixed write-read-lock: | ok | | ok |
--------------------------------------------------------------------------
hard-irqs-on + irq-safe-A/12: ok | ok | ok |
soft-irqs-on + irq-safe-A/12: ok | ok | ok |
hard-irqs-on + irq-safe-A/21: ok | ok | ok |
soft-irqs-on + irq-safe-A/21: ok | ok | ok |
sirq-safe-A => hirqs-on/12: ok | ok |irq event stamp: 452
hardirqs last enabled at (452): [<c026ff85>] irqsafe2A_rlock_12+0x8d/0xcc
hardirqs last disabled at (451): [<c0115ce4>] cpu_clock+0xe/0x49
softirqs last enabled at (448): ...Hi Mariusz, FWIW, reverting softlockup-use-cpu_clock-instead-of-sched_clock.patch fixes the problem here. Regards, Frederik -
I get this during resume from suspend to RAM and during hibernation: WARNING: at /home/rafael/src/mm/linux-2.6.23-rc3-mm1/arch/x86_64/kernel/smp.c:380 smp_call_function_single() Call Trace: [<ffffffff8021a97d>] smp_call_function_single+0x52/0xff [<ffffffff8022e703>] task_rq_lock+0x3d/0x6f [<ffffffff80230fc0>] set_cpus_allowed+0xbf/0xcc [<ffffffff802141ab>] sc_freq_event+0x5f/0x63 [<ffffffff80431c38>] notifier_call_chain+0x33/0x65 [<ffffffff8024c11f>] __srcu_notifier_call_chain+0x4b/0x69 [<ffffffff8024c14c>] srcu_notifier_call_chain+0xf/0x11 [<ffffffff803bcfa4>] cpufreq_resume+0x131/0x157 [<ffffffff8038151c>] __sysdev_resume+0x34/0x73 [<ffffffff80381b76>] sysdev_resume+0x1f/0x61 [<ffffffff803865e8>] device_power_up+0x9/0x10 [<ffffffff80256620>] suspend_devices_and_enter+0xbf/0xf7 [<ffffffff802567bb>] enter_state+0x163/0x1e5 [<ffffffff802568e1>] state_store+0xa4/0xc2 [<ffffffff802d7bc5>] subsys_attr_store+0x31/0x33 [<ffffffff802d7e8d>] sysfs_write_file+0xe0/0x11c [<ffffffff80293b77>] vfs_write+0xc7/0x150 [<ffffffff802940f8>] sys_write+0x47/0x70 [<ffffffff8020bdce>] system_call+0x7e/0x83 Apparently, smp_call_function_single() is unhappy, because it's called with interrupts disabled by sc_freq_event() executed (as a notifier) by cpufreq_resume(). However, cpufreq_resume() is always run with one CPU on line, so all this stuff should be handled differently. Oh, dear. -
Hello, Got that on imac g3. CC kernel/kgdb.o kernel/kgdb.c: In function 'kgdb_handle_exception': kernel/kgdb.c:940: error: invalid lvalue in unary '&' kernel/kgdb.c:940: warning: type defaults to 'int' in declaration of '_o_' kernel/kgdb.c:940: error: invalid lvalue in unary '&' kernel/kgdb.c:940: warning: type defaults to 'int' in declaration of '_n_' kernel/kgdb.c:940: error: invalid lvalue in unary '&' kernel/kgdb.c:940: error: invalid lvalue in unary '&' kernel/kgdb.c:940: error: invalid lvalue in unary '&' kernel/kgdb.c:940: warning: type defaults to 'int' in declaration of 'type name' make[1]: *** [kernel/kgdb.o] Blad 1 make: *** [kernel] Blad 2 Regards, Mariusz
On Wed, 22 Aug 2007 21:04:28 +0200
I'm not surprised.
while (cmpxchg(&atomic_read(&debugger_active), 0, (procid + 1)) != 0) {
a) cmpxchg isn't available on all architectures
b) we can't just go and take the address of atomic_read()'s return value!
c) that's pretty ugly-looking stuff anyway.
-
Against the tip of the kernel + kgdb patches this config builds. I wonder if is the compiler or the macros for atomic_read or cmpxchg have changed for in the -mm tree. Perhaps it is not relevant though if you It was available for all the archs that the kgdb had been implemented on Perhaps yes, perhaps no I guess it depends on what actually gets generated... In the past the intent of this was to guard for the race to be the master processor and looked like some attempt to do it Perhaps there is a cleaner way to do the same thing and avoid the cmpxchg all together. I used the attached patch to eliminate the cmpxchg operation. Jason.
On Wed, 22 Aug 2007 17:44:12 -0500 eek. We're in the process of hunting down and eliminating exactly this construct. There have been cases where the compiler cached the atomic_read() result in a register, turning the above into an infinite loop. Plus we should never add power-burners like that into the kernel anyway. That loop should have a cpu_relax() in it. Which will also fix the compiler problem described above. Thirdly, please always add a newline when coding statements like that: while (expr()) ; -
The other instances I found of the same problem in the kgdb core are fixed too. I merged all the changes into the for_mm branch in the kgdb git tree. Thanks, Jason. -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Where is the kgdb git tree? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG1gS/JICwm/rv3hoRAhfRAJ42F3QlzGwG4aQbs9hHVMI4kJ9SWQCfXrku UGo97ByKsB9yhyIu5c+2Jh0= =welB -----END PGP SIGNATURE----- -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Trying: git clone http://master.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git - - -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG1gnFJICwm/rv3hoRApOoAJ9BHXLsIuxDiOCaAFRfAZGwrDXATQCeLL3O bxtr3qz0soPRghPmtSZgOqc= =kQd1 -----END PGP SIGNATURE----- -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Why am I getting this when I do: git clone http://master.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git - ---------------------------------------------------------------------------- error: Couldn't get http://master.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git/refs/tags... for tags/v2.6.11 The requested URL returned error: 404 error: Could not interpret tags/v2.6.11 as something to pull rm: cannot remove directory `/nethome/piet/Src/linux/git/jwessel/linux-2.6-kgdb/.git/clone-tmp': Directory not empty /nethome/piet/Src/linux/git/jwessel$ - ---------------------------------------------------------------------------- We are getting a problem with VMware where kernel text is the schedler is getting wacked with four null bytes into the code. Thought I'd use the current linux-2.6-kgdb.git tree and possible the CONFIG_DEBUG_RODATA patch to make kernel text readonly: https://www.x86-64.org/pipermail/patches/2007-March/003666.html I thought the kernel text was RO and gdb had to disable it to insert a breakpoint. - - -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG1hshJICwm/rv3hoRAhTGAJ46pq69zYHqRmT+yTmRx+RVh8aBtgCfdyFM gl91xCFTy0NJxHalVXpd9Os= =c8FZ -----END PGP SIGNATURE----- -
See the URLs at the top of http://git.kernel.org/?p=linux/kernel/git/jwessel/linux-2.6-kgdb.git;a=summary --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -
I have only ever used: git clone git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git Jason. -
If you are going to make all the kernel text RO, then you are going to have to add some code to the kgdb write memory so as to unprotect a given page or all the breakpoint writes are going to fail. Alternatively you can use HW breakpoints. But, I have no idea if your VM Ware simulated HW emulate HW breakpoint registers or not. Jason. -
Hello, Got that on athlon x86_32: CC [M] drivers/net/wireless/rt2x00mac.o drivers/net/wireless/rt2x00mac.c: In function `rt2x00mac_tx_rts_cts': drivers/net/wireless/rt2x00mac.c:61: warning: passing arg 2 of `ieee80211_ctstoself_get' makes pointer from integer without a cast drivers/net/wireless/rt2x00mac.c:61: warning: passing arg 3 of `ieee80211_ctstoself_get' makes integer from pointer without a cast drivers/net/wireless/rt2x00mac.c:61: warning: passing arg 4 of `ieee80211_ctstoself_get' makes pointer from integer without a cast drivers/net/wireless/rt2x00mac.c:61: warning: passing arg 5 of `ieee80211_ctstoself_get' from incompatible pointer type drivers/net/wireless/rt2x00mac.c:61: error: too many arguments to function `ieee80211_ctstoself_get' drivers/net/wireless/rt2x00mac.c:65: warning: passing arg 2 of `ieee80211_rts_get' makes pointer from integer without a cast drivers/net/wireless/rt2x00mac.c:65: warning: passing arg 3 of `ieee80211_rts_get' makes integer from pointer without a cast drivers/net/wireless/rt2x00mac.c:65: warning: passing arg 4 of `ieee80211_rts_get' makes pointer from integer without a cast drivers/net/wireless/rt2x00mac.c:65: warning: passing arg 5 of `ieee80211_rts_get' from incompatible pointer type drivers/net/wireless/rt2x00mac.c:65: error: too many arguments to function `ieee80211_rts_get' make[3]: *** [drivers/net/wireless/rt2x00mac.o] Error 1 make[2]: *** [drivers/net/wireless] Error 2 make[1]: *** [drivers/net] Error 2 make: *** [drivers] Error 2 Regards, Mariusz Linux localhost 2.6.23-rc3-mm1 #2 PREEMPT Wed Aug 22 19:45:30 CEST 2007 i686 AMD Athlon(tm) XP 1700+ AuthenticAMD GNU/Linux Gnu C 3.4.6 Gnu make 3.81 binutils 2.17 util-linux 2.12r mount 2.12r module-init-tools 3.2.2 e2fsprogs 1.39 nfs-utils 1.0.6 Linux C Library 2.5 Dynamic linker (ldd) 2.5 Procps 3.2.7 Net-tools 1.60 Kbd ...
This has been fixed for quite some time already. John, I can't check this myself now, but which rt2x00 patches have gone into the -mm tree? Since I believe the patch that changed ieee80211_ctstoself_get was followed by a patch to fix rt2x00 within the same series... Ivo -
Ok. Thanks. What about this one? CC [M] drivers/net/wireless/zd1211rw-mac80211/zd_mac.o drivers/net/wireless/zd1211rw-mac80211/zd_mac.c: In function `zd_op_erp_ie_changed': drivers/net/wireless/zd1211rw-mac80211/zd_mac.c:822: error: `IEEE80211_ERP_CHANGE_PREAMBLE' undeclared (first use in this function) drivers/net/wireless/zd1211rw-mac80211/zd_mac.c:822: error: (Each undeclared identifier is reported only once drivers/net/wireless/zd1211rw-mac80211/zd_mac.c:822: error: for each function it appears in.) drivers/net/wireless/zd1211rw-mac80211/zd_mac.c: At top level: drivers/net/wireless/zd1211rw-mac80211/zd_mac.c:844: error: unknown field `erp_ie_changed' specified in initializer drivers/net/wireless/zd1211rw-mac80211/zd_mac.c:844: warning: initialization from incompatible pointer type make[4]: *** [drivers/net/wireless/zd1211rw-mac80211/zd_mac.o] Error 1 make[3]: *** [drivers/net/wireless/zd1211rw-mac80211] Error 2 make[2]: *** [drivers/net/wireless] Error 2 make[1]: *** [drivers/net] Error 2 make: *** [drivers] Error 2 Regards, Mariusz -
I'm not a zd1211rw developer, but a quick look into the patch series it seems that the mac80211 version in -mm1 does not contain the patch [PATCH 4/4] mac80211: implement ERP info change notifications But it does contain the zd1211rw patch: [PATCH] zd1211rw-mac80211: use correct preambles for RTS/CTS frames Which depended on the above mentioned mac80211 patch. Just had a second thought about those rt2x00 compilation errors you reported, the error is not caused by rt2x00 lagging behind mac80211 api changes but that rt2x00 patches to follow the api changes are going upstream but the mac80211 api changes it depends on are not going anywhere. It seems that mac80211 has not been updated in the -mm tree while the drivers have been updated. This is causing the compilation errors for both rt2x00 as zd1211rw. I'll bet that if you try any other mac80211 driver similar issues will arise. Ivo -
Yup. This also happens to the b43 driver, for example. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth -
Andrew had a lot of problems working-out conflicts between wireless-dev and net-2.6.24. I have since taken steps to help with this, but I think his pull was from before the wireless-dev rebase. Hopefully the next -mm will be better. John -- John W. Linville linville@tuxdriver.com -
/home/devel/linux-mm/fs/xfs/xfs_bmap_btree.c: In function 'xfs_bmbt_set_allf': /home/devel/linux-mm/fs/xfs/xfs_bmap_btree.c:2312: error: 'b' undeclared (first use in this function) /home/devel/linux-mm/fs/xfs/xfs_bmap_btree.c:2312: error: (Each undeclared identifier is reported only once /home/devel/linux-mm/fs/xfs/xfs_bmap_btree.c:2312: error: for each function it appears in.) /home/devel/linux-mm/fs/xfs/xfs_bmap_btree.c: In function 'xfs_bmbt_disk_set_allf': /home/devel/linux-mm/fs/xfs/xfs_bmap_btree.c:2372: error: 'b' undeclared (first use in this function) make[3]: *** [fs/xfs/xfs_bmap_btree.o] Error 1 make[2]: *** [fs/xfs] Error 2 make[1]: *** [fs] Error 2 make: *** [_all] Error 2 Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ -
Build fix. Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> --- linux-mm-clean/fs/xfs/xfs_bmap_btree.c 2007-08-22 12:20:35.000000000 +0200 +++ linux-mm/fs/xfs/xfs_bmap_btree.c 2007-08-22 12:15:52.000000000 +0200 @@ -2309,7 +2309,7 @@ xfs_bmbt_set_allf( ((xfs_bmbt_rec_base_t)blockcount & (xfs_bmbt_rec_base_t)XFS_MASK64LO(21)); #else /* !XFS_BIG_BLKNOS */ - if (ISNULLSTARTBLOCK(b)) { + if (ISNULLSTARTBLOCK(startblock)) { r->l0 = ((xfs_bmbt_rec_base_t)extent_flag << 63) | ((xfs_bmbt_rec_base_t)startoff << 9) | (xfs_bmbt_rec_base_t)XFS_MASK64LO(9); @@ -2369,7 +2369,7 @@ xfs_bmbt_disk_set_allf( ((xfs_bmbt_rec_base_t)blockcount & (xfs_bmbt_rec_base_t)XFS_MASK64LO(21))); #else /* !XFS_BIG_BLKNOS */ - if (ISNULLSTARTBLOCK(b)) { + if (ISNULLSTARTBLOCK(startblock)) { r->l0 = cpu_to_be64( ((xfs_bmbt_rec_base_t)extent_flag << 63) | ((xfs_bmbt_rec_base_t)startoff << 9) | -
Hi Michal, Thanks for the patch. This would be a problem for 32bit machines without large blocksize support (i.e. in our xfs tests: !XFS_BIG_BLKNOS => (BITS_PER_LONG == 32 && !defined(CONFIG_LBD)) which we obviously didn't do a build test for. I'll check it into our local tree and push to the master branch for Andrew. --Tim -
Hi Andrew, Following Kernel Bug was raised when i tried compiling and booting ppc64 machine with 2.6.23-rc3-mm1 kernel. ================================================================= Freeing initrd memory: 908k freed sysctl table check failed: /kernel .1 Writable sysctl directory skb_over_panic: text:c0000000002bf840 len:139 put:29 head:c00000000ffe7400 data:c00000000ffe7400 tail:0x8b end:0x80 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:95! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=128 NUMA pSeries Modules linked in: NIP: c0000000003fd7c4 LR: c0000000003fd7c0 CTR: 80000000000f97dc REGS: c0000000027f3850 TRAP: 0700 Not tainted (2.6.23-rc3-mm1-autokern1) MSR: 8000000000029032 <EE,ME,IR,DR> CR: 24288024 XER: 00000010 TASK = c000000009fc0000[1] 'swapper' THREAD: c0000000027f0000 CPU: 0 GPR00: c0000000003fd7c0 c0000000027f3ad0 c000000000737710 0000000000000082 GPR04: 0000000000000001 0000000000000001 0000000000000000 c00000000062bb3c GPR08: 0000000000000000 c00000000067b2e0 0000000000002100 c00000000077b110 GPR12: 0000000000004000 c000000000649f00 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 c00000000802a0c0 c00000000052ea30 c0000000005392a8 GPR24: c000000009f6b908 c0000000006a4000 c0000000026e2000 0000000000000020 GPR28: c00000000ffe746e 0000000000000004 c0000000006f5340 c000000009f6a900 NIP [c0000000003fd7c4] .skb_over_panic+0x50/0x58 LR [c0000000003fd7c0] .skb_over_panic+0x4c/0x58 Call Trace: [c0000000027f3ad0] [c0000000003fd7c0] .skb_over_panic+0x4c/0x58 (unreliable) [c0000000027f3b60] [c0000000002bf848] .kobject_uevent_env+0x4f8/0x528 [c0000000027f3c80] [c00000000032512c] .device_add+0x2bc/0x730 [c0000000027f3d50] [c000000000022330] .vio_register_device_node+0x1a4/0x274 [c0000000027f3e00] [c0000000005d34a8] .vio_bus_init+0xa0/0xec [c0000000027f3e80] ...
gargh, sorry, that's probably due to my screwed up attempt to fix Kay's screwed up gregkh-driver-driver-core-change-add_uevent_var-to-use-a-struct.patch. Kay sent an update patch but it didn't arrive in time. Greg, if you haven't yet merged that, please do so asap? So what _should_ this: --- a/arch/powerpc/kernel/vio.c~fix-4-gregkh-driver-driver-core-change-add_uevent_var-to-use-a-struct +++ a/arch/powerpc/kernel/vio.c @@ -373,7 +373,7 @@ static int vio_hotplug(struct device *de dn = dev->archdata.of_node; if (!dn) return -ENODEV; - cp = of_get_property(dn, "compatible", &length); + cp = of_get_property(dn, "compatible", &env->buflen); if (!cp) return -ENODEV; _ have done? -
Does replacing "&length" with "NULL" work? That's what's in the updated patch. Thanks, Kay -
Hi, Kay, replacing &length with NULL does not work for me. I get a message saying that init terminated with signal 7. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -
Hi Balbir, ugh, I can't see what's going wrong here. Care to just "return 0" for the whole function, and try again? Just to rule out that this is the cause of the problem. Thanks, Kay -
Same here.. I went through the new add_uevent_var() code. The only change I found was that instead of using env->envp[env->envp_idx] as an argument to vsnprintf(), the code looks semantically the same. Even with those changes, the assignment of env->envp[env->envp_idx++] to &env->buf[ env->buflen] makes the semantics look similar. I verified that the arguments to add_uevent_var() are sane. So at this point, I am a little lost. I'll debug further and see if the socket -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -
Hi, Kay, I just confirmed, your NULL fix looks correct. The init got signal 7 problem occurs even if uevent_add_var() is commented out. I suspect that -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -
... CC arch/i386/boot/cpu.o CC arch/i386/boot/cpucheck.o WARNING: "div64_64" [net/netfilter/xt_connbytes.ko] has no CRC! CC arch/i386/boot/edd.o AS arch/i386/boot/header.o CC arch/i386/boot/main.o ... config : http://194.231.229.228/kernel/mm/2.6.23-rc3-mm1/config build-log: http://194.231.229.228/kernel/mm/2.6.23-rc3-mm1/build-log Regards, Gabriel -
Hmm.. I don't know ( added netdev to Cc ) I got one more : ... WARNING: "div64_64" [net/ipv4/tcp_cubic.ko] has no CRC! ... Btw when modprobing these the kernel gets tainted ... [ 5498.536055] nf_conntrack version 0.5.0 (10240 buckets, 40960 max) [ 5498.554844] xt_connbytes: no version for "div64_64" found: kernel tainted. -
cu Adrian <-- snip --> This patch makes the 64bit integers on 32bit architectures usable for all C parsers that know about "long long". Signed-off-by: Adrian Bunk <bunk@kernel.org> --- include/asm-arm/types.h | 10 +++++++--- include/asm-avr32/types.h | 10 +++++++--- include/asm-blackfin/types.h | 11 +++++++---- include/asm-cris/types.h | 10 +++++++--- include/asm-frv/types.h | 10 +++++++--- include/asm-h8300/types.h | 10 +++++++--- include/asm-i386/types.h | 10 +++++++--- include/asm-m32r/types.h | 11 ++++++++--- include/asm-m68k/types.h | 10 +++++++--- include/asm-mips/types.h | 10 +++++++--- include/asm-parisc/types.h | 10 +++++++--- include/asm-powerpc/types.h | 9 ++++++--- include/asm-s390/types.h | 9 ++++++--- include/asm-sh/types.h | 10 +++++++--- include/asm-sh64/types.h | 10 +++++++--- include/asm-v850/types.h | 10 +++++++--- include/asm-xtensa/types.h | 10 +++++++--- 17 files changed, 118 insertions(+), 52 deletions(-) 4b6826d7a2f5b54a6a3b1cfa8cd40b1b27621be0 diff --git a/include/asm-arm/types.h b/include/asm-arm/types.h index 3141451..1dae25b 100644 --- a/include/asm-arm/types.h +++ b/include/asm-arm/types.h @@ -19,11 +19,15 @@ typedef unsigned short __u16; typedef __signed__ int __s32; typedef unsigned int __u32; -#if defined(__GNUC__) -__extension__ typedef __signed__ long long __s64; -__extension__ typedef unsigned long long __u64; +#if defined(__GNUC__) && defined(__STRICT_ANSI__) +#define __extension_long_long __extension__ +#else +#define __extension_long_long #endif +__extension_long_long typedef __signed__ long long __s64; +__extension_long_long typedef unsigned long long __u64; + #endif /* __ASSEMBLY__ */ /* diff --git a/include/asm-avr32/types.h b/include/asm-avr32/types.h index 8999a38..2c14f49 100644 --- a/include/asm-avr32/types.h +++ b/include/asm-avr32/types.h @@ -25,11 +25,15 @@ typedef ...
ah, yet another attempt at this stuff you probably need to update linux/types.h as well -mike -
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
-
just grep for __GNUC__ ... #if defined(__GNUC__) && !defined(__STRICT_ANSI__) typedef __u64 uint64_t; typedef __u64 u_int64_t; typedef __s64 int64_t; #endif #if defined(__GNUC__) && !defined(__STRICT_ANSI__) typedef __u64 __bitwise __le64; typedef __u64 __bitwise __be64; #endif you've made available __u64 and __s64, but not the rest ... -mike -
Given that this patch (hopefully) fixes a problem in the current net-2.6.24 tree, I'm inclined to slip it into mainline immediately. But I'd like a better description, please. Which "non-gcc parser" are we talking about here? Something under ./scripts/. Well, please identify it, and describe what the problem is, and how the proposed patch will address it. Let's cc Sam too, as I guess he's the guy whose code just broke. Thanks. -
If my analysis is correct then genksyms fails to produce a CRC for div64_64 because genksyms does not know the __extension__ keyword. And this patch just paper over the real bug wich is in genksyms - right? So we should fix the root cause here. Googeling I did not find a good description of where __extension__ can be used so I fail to see where in the parse.y file I shal add the keyword. I think __extension__ may be used both as a part of an expression AND as part of a typedef (as in this case) but I wonder if this is where it is limited to be used. I would like to have this sorted out so we do not do a half-backed solution, and the proposed patch as it just paper over the real bug is no good. Sam -
Hi,
The grammatic rules involving __extension__ are these (the lhs stems from
the standard directly):
external-declaration:
__extension__ external-declaration
struct-declaration:
__extension__ struct-declaration
nested-declaration:
__extension__ nested-declaration
unary-operator: one of
__extension__ __real__ __imag__
The first three allow to put __extension__ in front of any external or
local declaration (including decls inside blocks, in C99), ala:
{
x = 1+3;
__extension__ int y = 3;
x += y;
}
the last one defines __extension__ as an unary operator, which can be
applied to all cast-expressions (which in turn are just unary
expressions). E.g.:
x = 1 + __extension__ (2+3);
Note that the decls include the C99 nested-decls in for statements:
for (__extension__ long long i = 0; ...)
Note further that there's a small ambiguity in parsing when just looking
forward one token, namely between decl and expression, like in this
example:
{ __extension__ int i;
vs.
{ __extension__ i + 2;
Here you can't decide if __extension__ introduces an expression or a decl.
Probably doesn't matter for your parser. Hope this helps.
Ciao,
Michael.
-
I found only one gcc manual page on __extension__: http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Alternate-Keywords.html#Alternate-Keywords (also found for other gcc versions) --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -
It fixes a bug exposed by a -mm only patch, not by the net tree
It's about parsers like the Sun C compiler and the C parser shipped
with genksyms.
We can fix the C parser shipped with genksyms, but we have nearly the
same problem with userspace C parsers:
These are userspace headers, and we had a bug report that the Sun C
compiler was not able to compile some userspace code.
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
-
So it is about two bugs. 1) kbuild (genksyms) fails to generate CRC for some symbols 2) allow userspace to parse the header As for 2 we already use sed to remove a lot of stuff in our headers so why do we use another approach here? As for 1 I will try to teach genksyms to accept __extension__ but it seems leess trivial than I expected (most be fooling myself somehow). Sam -
the sed removes things permanently and is designed for scrubbing things that are kernel-only ... in this case, these typedefs are not kernel only, but exposed conditionally when the compiler/standard allows for it -mike -
This time it's the other way round:
We anyway need a way to hide __extension__ from non-gcc userspace C
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
-
OK. I have anyway added support for __extension__ in genksyms. See below patch. Note: To try this patch out do the following in a fresh tree (no generated files): $ rm scripts/genksyms/*_shipped $ apply patch $ make GENERATE_PARSER=1 ... In kbuild.git the _shipped files are updated but that would just be noise here. From: Sam Ravnborg <sam@ravnborg.org> Date: Tue, 28 Aug 2007 20:28:55 +0200 Subject: [PATCH] kbuild: __extension__ support in genksyms (fix unknown CRC warning) Recently the __extension__ keyword has been introduced in the kernel. Teach genksyms about this keyword so it can generate correct CRC for exported symbols that uses a symbol marked __extension__. For now only the typedef variant: __extension__ typedef ... is supported. Later we may add more variants as needed. This patch contains the actual source file changes. The following patch will hold modifications to the generated files (*_shipped) and only after the second patch the fix has effect. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> --- scripts/genksyms/keywords.gperf | 1 + scripts/genksyms/parse.y | 5 ++++- 2 files changed, 5 insertions(+), 1 deletions(-) diff --git a/scripts/genksyms/keywords.gperf b/scripts/genksyms/keywords.gperf index c75e0c8..5ef3733 100644 --- a/scripts/genksyms/keywords.gperf +++ b/scripts/genksyms/keywords.gperf @@ -11,6 +11,7 @@ __attribute, ATTRIBUTE_KEYW __attribute__, ATTRIBUTE_KEYW __const, CONST_KEYW __const__, CONST_KEYW +__extension__, EXTENSION_KEYW __inline, INLINE_KEYW __inline__, INLINE_KEYW __signed, SIGNED_KEYW diff --git a/scripts/genksyms/parse.y b/scripts/genksyms/parse.y index ca04c94..408cdf8 100644 --- a/scripts/genksyms/parse.y +++ b/scripts/genksyms/parse.y @@ -61,6 +61,7 @@ remove_list(struct string_list **pb, struct string_list **pe) %token DOUBLE_KEYW %token ENUM_KEYW %token EXTERN_KEYW +%token EXTENSION_KEYW %token FLOAT_KEYW %token INLINE_KEYW %token INT_KEYW @@ -110,7 +111,9 @@ ...
Got it with a randconfig ( http://194.231.229.228/kernel/mm/2.6.23-rc3-mm1/r/randconfig-8 ) ... net/ipv4/fib_trie.c: In function 'trie_rebalance': net/ipv4/fib_trie.c:969: error: lvalue required as unary '&' operand net/ipv4/fib_trie.c:971: error: lvalue required as unary '&' operand net/ipv4/fib_trie.c:977: error: lvalue required as unary '&' operand net/ipv4/fib_trie.c:980: error: lvalue required as unary '&' operand net/ipv4/fib_trie.c: In function 'fib_insert_node': net/ipv4/fib_trie.c:1034: error: lvalue required as unary '&' operand net/ipv4/fib_trie.c:1034: error: lvalue required as unary '&' operand net/ipv4/fib_trie.c:1034: error: lvalue required as unary '&' operand net/ipv4/fib_trie.c: In function 'fn_trie_lookup': net/ipv4/fib_trie.c:1498: error: lvalue required as unary '&' operand net/ipv4/fib_trie.c:1502: error: lvalue required as unary '&' operand net/ipv4/fib_trie.c:1502: error: lvalue required as unary '&' operand net/ipv4/fib_trie.c:1503: error: lvalue required as unary '&' operand net/ipv4/fib_trie.c: In function 'trie_leaf_remove': net/ipv4/fib_trie.c:1539: error: lvalue required as unary '&' operand net/ipv4/fib_trie.c:1539: error: lvalue required as unary '&' operand net/ipv4/fib_trie.c:1539: error: lvalue required as unary '&' operand net/ipv4/fib_trie.c:1554: error: lvalue required as unary '&' operand net/ipv4/fib_trie.c: In function 'nextleaf': net/ipv4/fib_trie.c:1706: error: lvalue required as unary '&' operand net/ipv4/fib_trie.c:1743: error: lvalue required as unary '&' operand net/ipv4/fib_trie.c: In function 'fib_trie_get_next': net/ipv4/fib_trie.c:2046: error: lvalue required as unary '&' operand net/ipv4/fib_trie.c: In function 'fib_trie_seq_show': net/ipv4/fib_trie.c:2320: error: lvalue required as unary '&' operand make[2]: *** [net/ipv4/fib_trie.o] Error 1 make[1]: *** [net/ipv4] Error 2 make: *** [net] Error 2 make: *** Waiting for unfinished jobs.... ... -
Side effect of the git-net removal, temporarily removing
immunize-rcu_dereference-against-crazy-compiler-writers.patch should
work around it.
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
-
Alternatively, the following one-line patch to net/ipv4/fib_trie.c could be used. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> --- fib_trie.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -urpNa -X dontdiff linux-2.6.23-rc3-mm1/net/ipv4/fib_trie.c linux-2.6.23-rc3-mm1.compile/net/ipv4/fib_trie.c --- linux-2.6.23-rc3-mm1/net/ipv4/fib_trie.c 2007-08-22 09:20:33.000000000 -0700 +++ linux-2.6.23-rc3-mm1.compile/net/ipv4/fib_trie.c 2007-08-22 09:47:33.000000000 -0700 @@ -94,7 +94,7 @@ typedef unsigned int t_key; #define T_LEAF 1 #define NODE_TYPE_MASK 0x1UL #define NODE_PARENT(node) \ - ((struct tnode *)rcu_dereference(((node)->parent & ~NODE_TYPE_MASK))) + ((struct tnode *)(rcu_dereference((node)->parent) & ~NODE_TYPE_MASK)) #define NODE_TYPE(node) ((node)->parent & NODE_TYPE_MASK) -
... After first reading of this thread I've had an impression it's about compiler's behavior, but now it seems to me this patch is not an alternative, but a 'must be' and only proper way of calling rcu_dereference (with a variable instead of an expression)? Am I right? Regards, Jarek P. -
Yes, rcu_dereference() does indeed need to be invoked on a lvalue. Thanx, Paul -
CONFIG_SCSI_ADVANSYS=y && CONFIG_ISA=n results in :
...
drivers/built-in.o: In function `advansys_init':
advansys.c:(.init.text+0x38ea): undefined reference to `isa_register_driver'
advansys.c:(.init.text+0x38ff): undefined reference to `isa_register_driver'
advansys.c:(.init.text+0x3926): undefined reference to `isa_unregister_driver'
advansys.c:(.init.text+0x3930): undefined reference to `isa_unregister_driver'
drivers/built-in.o: In function `advansys_exit':
advansys.c:(.exit.text+0x340): undefined reference to `isa_unregister_driver'
advansys.c:(.exit.text+0x34a): undefined reference to `isa_unregister_driver'
make: *** [.tmp_vmlinux1] Error 1
...
I guess advansys_{init,exit} is missing some #ifdef's ..
config : http://194.231.229.228/kernel/mm/2.6.23-rc3-mm1/r/randconfig-9
Gabriel
-
That's one conclusion. I prefer to think that the ISA support should
behave the same as the PCI and EISA support:
----
When CONFIG_ISA is disabled, the isa_driver support will not be compiled
in. Define stubs so that we don't get link-time errors.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
diff --git a/include/linux/isa.h b/include/linux/isa.h
index 1b85533..b0270e3 100644
--- a/include/linux/isa.h
+++ b/include/linux/isa.h
@@ -22,7 +22,18 @@ struct isa_driver {
#define to_isa_driver(x) container_of((x), struct isa_driver, driver)
+#ifdef CONFIG_ISA
int isa_register_driver(struct isa_driver *, unsigned int);
void isa_unregister_driver(struct isa_driver *);
+#else
+static inline int isa_register_driver(struct isa_driver *d, unsigned int i)
+{
+ return 0;
+}
+
+static inline void isa_unregister_driver(struct isa_driver *d)
+{
+}
+#endif
#endif /* __LINUX_ISA_H */
--
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
-
Got it with a randconfig ( http://194.231.229.228/kernel/mm/2.6.23-rc3-mm1/r/randconfig-9 ) ( patch from http://lkml.org/lkml/2007/8/22/273 is needed too or CONFIG_SCSI_ADVANSYS need be N) ... ERROR: "slhc_init" [drivers/net/ppp_generic.ko] undefined! ERROR: "slhc_remember" [drivers/net/ppp_generic.ko] undefined! ERROR: "slhc_uncompress" [drivers/net/ppp_generic.ko] undefined! ERROR: "slhc_free" [drivers/net/ppp_generic.ko] undefined! ERROR: "slhc_compress" [drivers/net/ppp_generic.ko] undefined! ERROR: "slhc_toss" [drivers/net/ppp_generic.ko] undefined! make[1]: *** [__modpost] Error 1 make: *** [modules] Error 2 ... Regards, Gabriel -
08/22/07-07:01:07 building kernel - make bzImage CHK include/linux/version.h UPD include/linux/version.h CHK include/linux/utsrelease.h UPD include/linux/utsrelease.h SYMLINK include/asm -> include/asm-x86_64 CC arch/x86_64/kernel/asm-offsets.s arch/x86_64/kernel/asm-offsets.c:1: error: -mpreferred-stack-boundary=3 is not between 4 and 12 make[1]: *** [arch/x86_64/kernel/asm-offsets.s] Error 1 make: *** [prepare0] Error 2 08/22/07-07:01:08 Build the kernel. Failed rc = 2 08/22/07-07:01:08 build: Building kernel... Failed rc = 1 Failed and terminated the run 08/22/07-07:01:08 command complete: (1) rc=126 (TEST ABORT) Fatal error, aborting autorun config file at: http://test.kernel.org/abat/107411/build/dotconfig gcc version is 3.4.4 This does not occur when using a cross-compiler gcc 3.4.0 -- Mel Gorman -
On Wed, 22 Aug 2007 18:17:38 +0100 x86_64-mm-less-stack-alignment.patch has cflags-y += $(call cc-option,-mpreferred-stack-boundary=3) So we _should_ have detected that gcc didn't like =3, so it should not have been used. I am suspecting a kbuild glitch: asm-offsets.c tends to be handled in special ways (ie: it's usually the thing which blows up first) so perhaps it is somehow avoiding the above does-gcc-support-this test. Suitable cc's added ;) -
Reverting the patch does allow the kernel to build and boot on that machine. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -
On Wed, Aug 22, 2007 at 11:10:29AM -0700, Andrew Morton wrote:
It seems that this is a problem caused by the way we check for
compiler options in x86_64. Each compiler flag is checked for
individually and if available added to cflags-y, later that is
added to CFLAGS. However, this means that each flag is checked
in total isolation. On x86_64 (on this compiler at least) the
-mpreferred-stack-boundary and -m{32,64} flags are actually mutually
dependant, the alignment constraints vary based on the word size.
This leads to the compile failure:
# gcc -mpreferred-stack-boundary=3 -S -xc /dev/null -o FOO
# echo $?
0
# gcc -m64 -mpreferred-stack-boundary=3 -S -xc /dev/null -o FOO
/dev/null:1: error: -mpreferred-stack-boundary=3 is not between 4 and 12
# echo $?
1
In the main Makefile we always add each flag directly to CFLAGS
which means we check them all in combination, perhaps this is
prudent here also? Either way I suspect that changing the -m64
check to add itself directly to CFLAGS will fix this us.
-apw
-
Ok that makes sense. Most people don't see it because they don't need -m64. I fixed it up with ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/cflags-probe and then ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/less-stack-alignment (replacement for the mm patch) Can you test? -Andi -
Something like this then:
[PATCH] x86_64: fix preferred-stack-boundary check
gcc has different interpretation of the -preferred-stack-boundary flag
dependent on the option -m64 is present or not as seen in the following:
# gcc -mpreferred-stack-boundary=3 -S -xc /dev/null -o FOO
# echo $?
0
# gcc -m64 -mpreferred-stack-boundary=3 -S -xc /dev/null -o FOO
/dev/null:1: error: -mpreferred-stack-boundary=3 is not between 4 and 12
# echo $?
1
Adding the -m64 to CFLAGS let cc-option do the right thing.
Thanks to Andy Whitcroft <apw@shadowen.org> for spotting the root cause.
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
---
diff --git a/arch/x86_64/Makefile b/arch/x86_64/Makefile
index 128561d..5402c0a 100644
--- a/arch/x86_64/Makefile
+++ b/arch/x86_64/Makefile
@@ -25,6 +25,8 @@ LDFLAGS := -m elf_x86_64
OBJCOPYFLAGS := -O binary -R .note -R .comment -S
LDFLAGS_vmlinux :=
CHECKFLAGS += -D__x86_64__ -m64
+AFLAGS += -m64
+CFLAGS += -m64
cflags-y :=
cflags-kernel-y :=
@@ -36,7 +38,6 @@ cflags-$(CONFIG_MCORE2) += \
$(call cc-option,-march=core2,$(call cc-option,-mtune=generic))
cflags-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=generic)
-cflags-y += -m64
cflags-y += -mno-red-zone
cflags-y += -mcmodel=kernel
cflags-y += -pipe
@@ -69,7 +70,6 @@ cflags-$(CONFIG_CC_STACKPROTECTOR_ALL) += $(shell $(CONFIG_SHELL) $(srctree)/scr
CFLAGS += $(cflags-y)
CFLAGS_KERNEL += $(cflags-kernel-y)
-AFLAGS += -m64
head-y := arch/x86_64/kernel/head.o arch/x86_64/kernel/head64.o arch/x86_64/kernel/init_task.o
-
OK - Andi decided to do this in a bit more invasive way but it looks OK. So disregard my patch. Sam -
The flag actually needs a recent gcc 4.3 snapshot (it's a new feature the gcc developers added especially for the kernel :), so if this didn't work it would fail on the vast majority of systems. Somehow it doesn't? At least here it compiles fine. I notice the final comma is missing, Mel does it work when you change the line to cflags-y += $(call cc-option,-mpreferred-stack-boundary=3,) If not please run gcc -O2 -mpreferred-stack-boundary=2 -S -xc /dev/null -o x.o echo $? What does the echo output? -Andi -
No but that is hardly a suprise as it's looking like -m64 is the way Repeating really but; elm3b6:~# gcc -O2 -mpreferred-stack-boundary=2 -S -xc /dev/null -o x.o ; echo $? 0 elm3b6:~# gcc -m64 -O2 -mpreferred-stack-boundary=2 -S -xc /dev/null -o x.o ; echo $? /dev/null:1: error: -mpreferred-stack-boundary=2 is not between 4 and 12 1 -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -
That patch is no longer in 2.6.23-rc3-mm1, my bootlog says: [ 0.000000] Unknown boot option `sata_nv.swncq=1': ignoring I could not find this patch in any git trees I looked and its removal mail from mm-commit said: "This patch was dropped because Changes in Jeff's tree destroyed it." I only found out about the swncq=1 command line option yesterday and so tested it only one day. But I did not have any trouble with it, even as my drive was made by Maxtor. The chipset: 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) The drive: Device Model: MAXTOR STM3320820AS Serial Number: 5QF2E698 Firmware Version: 3.AAE As it worked for me, I hope that patch will be picked up by someone. :) Torsten -
On Wed, 22 Aug 2007 19:24:39 +0200 This is a fairly regular occurrence in ata land: patches from maintainers don't get merged, so I merge them for testing, then some fairly pointless cleanup-style patch goes on a great tree-wide rampage thus destabilising or simply destroying the more important, mysteriously-not-merged patch. Nobody knows why this happens. Peer and Kuan: can you please redo that patch against the current ata development tree? Thanks. -
allyesconfig RELOCS arch/i386/boot/compressed/vmlinux.relocs WARNING: Absolute relocations present Offset Info Type Sym.Value Sym.Name c06018f3 02ee1f01 R_386_32 c14adad0 _sdata Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ -
Yeah, that's Greg's pestiferous gregkh-driver-warn-when-statically-allocated-kobjects-are-used.patch acting up. I previously suggested that something like kallsyms_lookup() could be used for this, but I was cruelly ignored. -
Hello, This is from x86_32 with gcc 3.4.6: CC [M] sound/pci/hda/hda_codec.o sound/pci/hda/hda_codec.c: In function `snd_hda_codec_free': sound/pci/hda/hda_codec.c:517: sorry, unimplemented: inlining failed in call to 'free_hda_cache': function body not available sound/pci/hda/hda_codec.c:534: sorry, unimplemented: called from here sound/pci/hda/hda_codec.c:517: sorry, unimplemented: inlining failed in call to 'free_hda_cache': function body not available sound/pci/hda/hda_codec.c:535: sorry, unimplemented: called from here make[3]: *** [sound/pci/hda/hda_codec.o] Error 1 make[2]: *** [sound/pci/hda] Error 2 make[1]: *** [sound/pci] Error 2 make: *** [sound] Error 2 Regards, Mariusz -
At Wed, 22 Aug 2007 22:23:03 +0200,
Since it doesn't happen with gcc-4.x, this looks like a gcc-3.x
specific problem. Does the patch below fix?
Taksahi
diff -r db9001b20d29 pci/hda/hda_codec.c
--- a/pci/hda/hda_codec.c Wed Aug 22 14:19:45 2007 +0200
+++ b/pci/hda/hda_codec.c Wed Aug 22 23:06:00 2007 +0200
@@ -514,7 +514,7 @@ static int read_widget_caps(struct hda_c
static void init_hda_cache(struct hda_cache_rec *cache,
unsigned int record_size);
-static inline void free_hda_cache(struct hda_cache_rec *cache);
+static void free_hda_cache(struct hda_cache_rec *cache);
/*
* codec destructor
@@ -707,7 +707,7 @@ static void __devinit init_hda_cache(str
cache->record_size = record_size;
}
-static inline void free_hda_cache(struct hda_cache_rec *cache)
+static void free_hda_cache(struct hda_cache_rec *cache)
{
kfree(cache->buffer);
}
-
It happens because gcc doesn't see the whole file without
unit-at-a-time and we disable unit-at-a-time with gcc 3.4 on i386 due
to stack usage problems (and older GNU gcc versions don't support
unit-at-a-time at all).
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
-
Hi Jeremy,
arch/i386/kernel/alternative.c:alternative_instructions() doesn't
check for noreplace-smp before setting capability bits and freeing the
__smp_locks section.
Every call to alternatives_smp_unlock() checks for noreplace-smp
beforehand, so remove the check from there.
Boot tested on i386 with UP+noreplace-smp (lguest) and SMP (real hardware)
Regards,
Frederik
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
diff --git a/arch/i386/kernel/alternative.c b/arch/i386/kernel/alternative.c
index 9f4ac8b..7c5af80 100644
--- a/arch/i386/kernel/alternative.c
+++ b/arch/i386/kernel/alternative.c
@@ -221,9 +221,6 @@ static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end
u8 **ptr;
char insn[1];
- if (noreplace_smp)
- return;
-
add_nops(insn, 1);
for (ptr = start; ptr < end; ptr++) {
if (*ptr < text)
@@ -406,7 +403,7 @@ void __init alternative_instructions(void)
#endif
#ifdef CONFIG_SMP
- if (smp_alt_once) {
+ if (smp_alt_once && !noreplace_smp) {
if (1 == num_possible_cpus()) {
printk(KERN_INFO "SMP alternatives: switching to UP code\n");
set_bit(X86_FEATURE_UP, boot_cpu_data.x86_capability);
-
On Wed, 22 Aug 2007 22:25:51 +0200 umm, so? What happens then? What bug is being fixed here, and what are You refer to rc3-mm1 and this is described as a "-mm patch" but it seems to also be applicable to mainline? -
That means that even when you specify noreplace_smp, some replacing takes place anyway. One of the consequences, besides noreplace_smp not working as expected, is that lguest crashes when you feed it an SMP kernel Hmm yes, my bad. Regards, Frederik -
Hm. Is alt_smp_once useful?
J
-
[Added Gerd Hoffman and Rusty Russel to cc] It dies with: [ 0.131000] SMP alternatives: switching to UP code lguest: bad stack page 0xc057a000 I added a dump_stack on the Host, which gives: [124320.090946] [<c01052f8>] dump_trace+0x65/0x1de [124320.090956] [<c010548b>] show_trace_log_lvl+0x1a/0x2f [124320.090970] [<c0105ea4>] show_trace+0x12/0x14 [124320.090975] [<c0105fcd>] dump_stack+0x16/0x18 [124320.090980] [<f888032c>] pin_page+0x5f/0xa3 [lg] [124320.090993] [<f8880654>] pin_stack_pages+0x3a/0x4a [lg] [124320.091004] [<f888007e>] guest_pagetable_clear_all+0x12/0x15 [lg] [124320.091013] [<f887f81a>] do_hcall+0xb1/0x1cb [lg] [124320.091021] [<f887fbbe>] do_hypercalls+0x28a/0x2a0 [lg] [124320.091029] [<f887f2a2>] run_guest+0x24/0x492 [lg] [124320.091037] [<f8881b48>] read+0x83/0x8f [lg] [124320.091048] [<c0175a77>] vfs_read+0x8e/0x117 [124320.091054] [<c0175e99>] sys_read+0x3d/0x61 [124320.091059] [<c0104166>] sysenter_past_esp+0x6b/0xb5 [124320.091065] [<ffffe410>] 0xffffe410 [124320.091069] ======================= Now, the "SMP alternatives: switching to UP code" message made me wonder if it had anything to do with the alternatives, so I tried disabling the switch by passing noreplace_smp... ... But the message was displayes anyway (and the smp_locks section freed), because the check my patch adds is not made. With the patch, I can boot lguest with an SMP kernel if I pass I can't figure what the use case is, debugging set aside, but there are places (eg xen, __cpu_die) in the kernel calling alternatives_smp_switch(1) at runtime. Passing smp-alt-once will prevent the switch. Regards, Frederik -
How odd! This means that the guest set the kernel to a stack which it hadn't mapped writable (or perhaps not mapped at all). I always run SMP kernels, and that seems a very strange side effect of a patching problem... Nonetheless, I did have a previous problem with a bug in the patching code which didn't show up native and did show up under lguest. Can you send your config? Do you need noreplace-smp even on 2.6.23-rc3, or only 2.6.23-rc3-mm1? Thanks, Rusty. -
I had time to investigate this a little further, it appears that in fact 0xc057a000 is the beginning of the __smp_locks section. The crash responsible function call is in alternative_instructions(): free_init_pages("SMP alternatives", (unsigned long)__smp_locks, (unsigned long)__smp_locks_end); Ie, if I comment this out, I can boot lguest without passing noreplace_smp. BTW, to make things clear: the patch I sent does _not_ fix the lguest/alternatives problem. It just makes noreplace_smp functional Here it is: I'll try ASAP. Thanks, Frederik -
Ok, tested: I need noreplace-smp + patch to make it work on mainline too. Regards, Frederik -
If the stack pointer is 0xc057a000, then the first stack page is at 0xc0579000 (the stack pointer is decremented before use). Not calculating this correctly caused guests with CONFIG_DEBUG_PAGEALLOC=y to be killed with a "bad stack page" message: the initial kernel stack was just preceeding the .smp_locks section which CONFIG_DEBUG_PAGEALLOC marks read-only when freeing. Thanks to Frederik Deweerdt for the bug report! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> diff -r cb71c5b0bbb5 drivers/lguest/interrupts_and_traps.c --- a/drivers/lguest/interrupts_and_traps.c Sun Aug 26 10:31:53 2007 +1000 +++ b/drivers/lguest/interrupts_and_traps.c Sun Aug 26 10:34:44 2007 +1000 @@ -270,8 +270,11 @@ void pin_stack_pages(struct lguest *lg) /* Depending on the CONFIG_4KSTACKS option, the Guest can have one or * two pages of stack space. */ for (i = 0; i < lg->stack_pages; i++) - /* The stack grows *upwards*, hence the subtraction */ - pin_page(lg, lg->esp1 - i * PAGE_SIZE); + /* The stack grows *upwards*, so the address we're given is the + * start of the page after the kernel stack. Subtract one to + * get back onto the first stack page, and keep subtracting to + * get to the rest of the stack pages. */ + pin_page(lg, lg->esp1 - 1 - i * PAGE_SIZE); } /* Direct traps also mean that we need to know whenever the Guest wants to use -
Hello Rusty,
I just could try the patch, sorry for the delay. Albeit it allows to
progress a little further in the boot process, lguest seems to like that
"section that was just freed" :)
Please note that:
- It could progress to "Freeing SMP alternatives: 13k freed", which is new.
Indeed, your patch made the Host to pin 0xc04d3000, which is the
good page.
- 0xc04d4000 is the __smp_locks section:
$ objdump -h vmlinux
[...]
20 .data.init_task 00001000 c04d3000 004d3000 003d4000 2**2
CONTENTS, ALLOC, LOAD, DATA
21 .smp_locks 000036c8 c04d4000 004d4000 003d5000 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
[...]
[ 0.128503] SMP alternatives: switching to UP code
[ 0.132846] Freeing SMP alternatives: 13k freed
[ 0.135177] BUG: unable to handle kernel paging request at virtual address c04d4000
[ 0.135417] printing eip:
[ 0.135505] c01051df
[ 0.135564] *pde = 00005067
[ 0.135645] *pte = 004d4000
[ 0.135756] Oops: 0000 [#1]
[ 0.135825] PREEMPT SMP DEBUG_PAGEALLOC
[ 0.136039] Modules linked in:
[ 0.136205] CPU: 0
[ 0.136206] EIP: 0061:[<c01051df>] Not tainted VLI
[ 0.136207] EFLAGS: 00010097 (2.6.23-rc3 #5)
[ 0.136665] EIP is at dump_trace+0x5f/0x97
[ 0.136738] eax: c0614954 ebx: c04d3ffc ecx: c0497b00 edx: c04ef641
[ 0.136883] esi: c04d3000 edi: c04d3ffd ebp: c04d3da0 esp: c04d3d90
[ 0.137058] ds: 0069 es: 0069 fs: 00d8 gs: 0000 ss: 0069
[ 0.137235] Process swapper (pid: 0, ti=c04d3000 task=c04953e0 task.ti=c04d3000)
[ 0.137447] Stack: c0109d95 c0614954 c04953e0 00000000 c04d3db4 c010a1f1 c0497b00 c0614954
[ 0.137831] c0614954 c04d3dc4 c0140921 c0144252 c04959c8 c04d3dec c014272f c02eccf5
[ 0.138119] c04959c8 c0614938 c04d3dec 00000001 c04959c8 c0614938 c04953e0 c04d3e4c
[ 0.138497] Call Trace:
[ 0.138603] [<c0105231>] show_trace_log_lvl+0x1a/0x2f
[ 0.138798] [<c01052e1>] show_stack_log_lvl+0x9b/0xa3
[ ...Yes, I got this too, then had to jump on a plane (and away from my test box). Turns out this actually isn't my bug (yay!). See next patch... Rusty. -
We don't care if ebp is on the stack, we care about ebp + 4. Without
this, lguest (with CONFIG_DEBUG_LOCKDEP) can touch a page unmapped by
CONFIG_DEBUG_PAGEALLOC.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
diff -r b0b1ab8ecf48 arch/i386/kernel/traps.c
--- a/arch/i386/kernel/traps.c Fri Aug 31 03:25:06 2007 +1000
+++ b/arch/i386/kernel/traps.c Fri Aug 31 07:57:35 2007 +1000
@@ -100,7 +100,7 @@ print_context_stack(struct thread_info *tinfo,
unsigned long addr;
#ifdef CONFIG_FRAME_POINTER
- while (valid_stack_ptr(tinfo, (void *)ebp)) {
+ while (valid_stack_ptr(tinfo, (void *)ebp + 4)) {
unsigned long new_ebp;
addr = *(unsigned long *)(ebp + 4);
ops->address(data, addr);
-
Hmm.. This *really* cannot happen with a normal kernel - it implies that the stack has crossed into an invalid page. Why is that allowed with lguest? What kind of code could validly *ever* come in here and cause problems? I'm getting the nervous feeling that lguest is really doing things that shouldn't be done, or is using normal kernel functions in ways that they should not be used. In other words, yes, we load off "ebp+4", but I really don't see it being a valid situation wher ebp itself isn't also a valid stack frame. The stack is not sized for "off-by-one" errors - we're supposed to always have plenty of stack space free, and if you care about "off-by-one", you're not just living on the edge, you're way beyond it! IOW, please explain why/how lguest ever triggers a case where this would possibly matter! Linus -
AFAICT, a corrupt stack could lead us to touch a page which isn't mapped. If we assume the stack isn't corrupt, we don't have to do the head.S pushes a "$0" on the stack to stop the unwinder, lguest doesn't. Here's the lguest fix, but I still think the real fix posted previously is more important. Cheers, Rusty. === lguest doesn't terminate stack, upsets unwinder Copy head.S, which puts a 0 on the stack to terminate ebp-chasing backtrace code. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> diff -r 926e5cc964fd drivers/lguest/lguest_asm.S --- a/drivers/lguest/lguest_asm.S Fri Aug 31 08:02:08 2007 +1000 +++ b/drivers/lguest/lguest_asm.S Fri Aug 31 16:01:25 2007 +1000 @@ -19,6 +19,8 @@ movl $(init_thread_union+THREAD_SIZE),%esp movl %esi, %eax addl $__PAGE_OFFSET, %eax + /* Fake value to stop backtraces with CONFIG_FRAME_POINTER */ + pushl $0 jmp lguest_init /*G:055 We create a macro which puts the assembler code between lgstart_ and -
The unwinder should stop when it sees an invalid frame pointer, and even
without the push 0 I'd have expected it to be invalid.
But I suspect lguest triggers another thing: you actually make the stack
start at the *very*top* of the stack area. Afaik, normal x86 does not. A
normal x86 kernel will start off with a pt_regs[] setup, I think - ie the
kernel stack is always set up so that it has the "return to user mode"
information.
And *that* difference may be what triggers this for lguest, even though it
can never trigger for a "real" kernel.
But your patch does improve the sanity checking of the frame pointer. That
said, I think the following patch improves it more: does this also work
for you? (Totally untested, but it looks like the RightThing(tm) to do)
Linus
---
diff --git a/arch/i386/kernel/traps.c b/arch/i386/kernel/traps.c
index cfffe3d..b9998f3 100644
--- a/arch/i386/kernel/traps.c
+++ b/arch/i386/kernel/traps.c
@@ -100,10 +100,10 @@ asmlinkage void machine_check(void);
int kstack_depth_to_print = 24;
static unsigned int code_bytes = 64;
-static inline int valid_stack_ptr(struct thread_info *tinfo, void *p)
+static inline int valid_stack_ptr(struct thread_info *tinfo, void *p, unsigned size)
{
return p > (void *)tinfo &&
- p < (void *)tinfo + THREAD_SIZE - 3;
+ p <= (void *)tinfo + THREAD_SIZE - size;
}
static inline unsigned long print_context_stack(struct thread_info *tinfo,
@@ -113,7 +113,7 @@ static inline unsigned long print_context_stack(struct thread_info *tinfo,
unsigned long addr;
#ifdef CONFIG_FRAME_POINTER
- while (valid_stack_ptr(tinfo, (void *)ebp)) {
+ while (valid_stack_ptr(tinfo, (void *)ebp, 2*sizeof(unsigned long))) {
unsigned long new_ebp;
addr = *(unsigned long *)(ebp + 4);
ops->address(data, addr);
@@ -129,7 +129,7 @@ static inline unsigned long print_context_stack(struct thread_info *tinfo,
ebp = new_ebp;
}
#else
- while (valid_stack_ptr(tinfo, stack)) {
+ while ...This is only for the initial booting stack (init_thread_union); see arch/i386/kernel/head.S: /* Set up the stack pointer */ lss stack_start,%esp ... pushl $0 # fake return address for unwinder ... .data ENTRY(stack_start) .long init_thread_union+THREAD_SIZE .long __BOOT_DS lguest_asm.S missed the pushl $0 (lguest doesn't boot via head.S. I'd like to change that for 2.6.24, but it involved perturbing that code so *((unsigned long *)ebp + 1)? Thanks, Rusty. -
Ok, we should fix that. We should just make it look like all other stack
frames.
There is other code in the kernel that "knows" that all kernel stacks have
the fields for the user stack return on it, namely the ptrace code etc.
Now, the initial stack is hopefully never *accessed* by that kind of code,
Well, we might as well then just make the code readable instead. IOW, how
about this one, which just declares a structure that describes the stack
frame thing? That just makes everything clearer, since we can then use
"sizeof(that structure)" instead of using the magic "2*sizeof(unsigned
long)".
Linus
---
diff --git a/arch/i386/kernel/traps.c b/arch/i386/kernel/traps.c
index cfffe3d..47b0bef 100644
--- a/arch/i386/kernel/traps.c
+++ b/arch/i386/kernel/traps.c
@@ -100,36 +100,45 @@ asmlinkage void machine_check(void);
int kstack_depth_to_print = 24;
static unsigned int code_bytes = 64;
-static inline int valid_stack_ptr(struct thread_info *tinfo, void *p)
+static inline int valid_stack_ptr(struct thread_info *tinfo, void *p, unsigned size)
{
return p > (void *)tinfo &&
- p < (void *)tinfo + THREAD_SIZE - 3;
+ p <= (void *)tinfo + THREAD_SIZE - size;
}
+/* The form of the top of the frame on the stack */
+struct stack_frame {
+ struct stack_frame *next_frame;
+ unsigned long return_address;
+};
+
static inline unsigned long print_context_stack(struct thread_info *tinfo,
unsigned long *stack, unsigned long ebp,
struct stacktrace_ops *ops, void *data)
{
- unsigned long addr;
-
#ifdef CONFIG_FRAME_POINTER
- while (valid_stack_ptr(tinfo, (void *)ebp)) {
- unsigned long new_ebp;
- addr = *(unsigned long *)(ebp + 4);
+ struct stack_frame *frame = (struct stack_frame *)ebp;
+ while (valid_stack_ptr(tinfo, frame, sizeof(*frame))) {
+ struct stack_frame *next;
+ unsigned long addr;
+
+ addr = frame->return_address;
ops->address(data, addr);
/*
* break out of recursive entries (such as
* ...Does this fix a real problem? Or is there just some redundancy?
Wouldn't it be better to put the noreplace_smp test in one place?
J
-
I agree, but I don't think it is doable (alt_smp_once comes to mind). I'll double check however. Thanks, Frederik -
config : http://194.231.229.228/kernel/mm/2.6.23-rc3-mm1/r/randconfig-18 ... drivers/char/nozomi.c:2204: error: expected expression before '__attribute__' make[2]: *** [drivers/char/nozomi.o] Error 1 make[1]: *** [drivers/char] Error 2 make: *** [drivers] Error 2 make: *** Waiting for unfinished jobs.... ... -
__devexit should be __devexit_p --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -
config : http://194.231.229.228/kernel/mm/2.6.23-rc3-mm1/r/randconfig-18 ... fs/xfs/xfs_bmap_btree.c: In function 'xfs_bmbt_set_allf': fs/xfs/xfs_bmap_btree.c:2312: error: 'b' undeclared (first use in this function) fs/xfs/xfs_bmap_btree.c:2312: error: (Each undeclared identifier is reported only once fs/xfs/xfs_bmap_btree.c:2312: error: for each function it appears in.) fs/xfs/xfs_bmap_btree.c: In function 'xfs_bmbt_disk_set_allf': fs/xfs/xfs_bmap_btree.c:2372: error: 'b' undeclared (first use in this function) make[2]: *** [fs/xfs/xfs_bmap_btree.o] Error 1 make[2]: *** Waiting for unfinished jobs.... CC fs/reiser4/safe_link.o CC fs/reiser4/plugin/plugin.o make[1]: *** [fs/xfs] Error 2 make[1]: *** Waiting for unfinished jobs.... ... -
patch is here: http://lkml.org/lkml/2007/8/22/153 --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -
2.6.23-rc3-mm1/ After installing this new wonder kernel on my AMD-64 laptop, I discovered that Beagle wouldn't start. While enjoying how fast my system felt ( :) ) I also discovered that Evolution wouldn't start because it was built with mono integration. Can't live without email, so I poked at it and discovered that if I run mono applications (including Evolution) with the legacy memory layout, they work. Like this: setarch x86_64 -L evolution This didn't happen on -rc2-mm2, so I think somebody changed something. Mono claims to mmap with the MAP_32BIT option. In -rc3-mm1 strace shows mono's mmap like this: mmap(NULL, 65536, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS= |MAP_32BIT, -1, 0) =3D 0x7fa21f5cb000 It's got MAP_32BIT, but that's not a 32-bit address... --=20 Zan Lynx <zlynx@acm.org>
Thanks, it helps.
I'm thinking unkind thoughts about pie-executable-randomization.patch.
Below is a patch which removes
pie-executable-randomization.patch
pie-executable-randomization-fix.patch
pie-executable-randomization-fix-2.patch
from 2.6.23-rc3-mm1. 'twould be great if you could see if that fixes
things, thanks.
arch/ia64/ia32/binfmt_elf32.c | 2
arch/x86_64/mm/mmap.c | 107 ++++----------------------------
fs/binfmt_elf.c | 107 ++++++--------------------------
3 files changed, 38 insertions(+), 178 deletions(-)
diff -puN fs/binfmt_elf.c~revert-pie-executable-randomization fs/binfmt_elf.c
--- a/fs/binfmt_elf.c~revert-pie-executable-randomization
+++ a/fs/binfmt_elf.c
@@ -45,7 +45,7 @@
static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs);
static int load_elf_library(struct file *);
-static unsigned long elf_map (struct file *, unsigned long, struct elf_phdr *, int, int, unsigned long);
+static unsigned long elf_map (struct file *, unsigned long, struct elf_phdr *, int, int);
/*
* If we don't support core dumping, then supply a NULL so we
@@ -295,70 +295,33 @@ create_elf_tables(struct linux_binprm *b
#ifndef elf_map
static unsigned long elf_map(struct file *filep, unsigned long addr,
- struct elf_phdr *eppnt, int prot, int type,
- unsigned long total_size)
+ struct elf_phdr *eppnt, int prot, int type)
{
unsigned long map_addr;
- unsigned long size = eppnt->p_filesz + ELF_PAGEOFFSET(eppnt->p_vaddr);
- unsigned long off = eppnt->p_offset - ELF_PAGEOFFSET(eppnt->p_vaddr);
- addr = ELF_PAGESTART(addr);
- size = ELF_PAGEALIGN(size);
+ unsigned long pageoffset = ELF_PAGEOFFSET(eppnt->p_vaddr);
+ down_write(&current->mm->mmap_sem);
/* mmap() will return -EINVAL if given a zero size, but a
* segment with zero filesize is perfectly valid */
- if (!size)
- return addr;
-
- down_write(&current->mm->mmap_sem);
- /*
- * total_size is the size of the ELF (interpreter) ...Hi Zan,
thanks for an excellent bugreport. Rather than throwing the whole
pie-randomization and flexmmap support away, could you please test the
patch below and let me know whether it fixes all your issues? Thanks.
From: Jiri Kosina <jkosina@suse.cz>
Handle MAP_32BIT flags properly in x86_64 flexmmap
We need to handle MAP_32BIT flags of mmap() properly for 64bit
applications with filexible mmap layout.
This patch introduces x86_64-specific version of
arch_get_unmapped_area_topdown() which differs from the generic one in
handling of the MAP_32BIT flag -- when this flag is passed to mmap(), we
stick back to the legacy layout for this particular mmap, which gives
proper 32bit range.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
diff --git a/arch/x86_64/kernel/sys_x86_64.c b/arch/x86_64/kernel/sys_x86_64.c
index 4770b7a..0e44d08 100644
--- a/arch/x86_64/kernel/sys_x86_64.c
+++ b/arch/x86_64/kernel/sys_x86_64.c
@@ -16,6 +16,7 @@
#include <linux/file.h>
#include <linux/utsname.h>
#include <linux/personality.h>
+#include <linux/random.h>
#include <asm/uaccess.h>
#include <asm/ia32.h>
@@ -69,6 +70,7 @@ static void find_start_end(unsigned long flags, unsigned long *begin,
unsigned long *end)
{
if (!test_thread_flag(TIF_IA32) && (flags & MAP_32BIT)) {
+ unsigned long new_begin;
/* This is usually used needed to map code in small
model, so it needs to be in the first 31bit. Limit
it to that. This means we need to move the
@@ -78,6 +80,11 @@ static void find_start_end(unsigned long flags, unsigned long *begin,
of playground for now. -AK */
*begin = 0x40000000;
*end = 0x80000000;
+ if (current->flags & PF_RANDOMIZE) {
+ new_begin = randomize_range(*begin, *begin + 0x02000000, 0);
+ if (new_begin)
+ *begin = new_begin;
+ }
} else {
*begin = TASK_UNMAPPED_BASE;
*end = TASK_SIZE;
@@ -147,6 +154,97 @@ full_search:
}
}
+
+unsigned long
+arch_get_unmapped_area_topdown(struct file ...[snip patch] This does fix the mono problem. Thank you. --=20 Zan Lynx <zlynx@acm.org>
On Thu, 23 Aug 2007 11:28:25 +0200 (CEST) arch/x86_64/kernel/sys_x86_64.c | 98 ++++++++++++++++++++++++++++++++++++++++ include/asm-x86_64/pgtable.h | 1 well that's another hunk of code for us to maintain and to slow all our computers down. It is quite unobvious to me that the whole pie-randomization thing is worth merging. Why shouldn't we just drop the lot? <looks at the changelog> This patch is using mmap()'s randomization functionality in such a way that it maps the main executable of (specially compiled/linked -pie/-fpie) ET_DYN binaries onto a random address (in cases in which mmap() is allowed to perform a randomization). The code has been extraced from Ingo's exec-shield patch http://people.redhat.com/mingo/exec-shield/ that certainly doesn't tell anyone why we should merge this code into Linux. -
(some more CCs added) Hi Andrew, well, whenever it comes to address space layout randomization, there usually follows a huge debate whether it is needed or not, some people think it's useful and powerful security protection against 0day attacks, other people think that it's just fighting the bugs in userspace software in a wrong way. Opinions differ, that's why there is a way to turn the VA space randomization completely off trivially. We already have randomized stack, randomized mmap base, randomized vdso page in mainline kernel, but code and heap still stay on deterministic addresses. I think providing the possibility for users to have really full address space randomization (if they want to) is much better than providing the current slightly crippled state, when some parts of address space are randomized and some are not. Or do you think we should rather rip all the randomization off? And it's almost certain to me that users want this functionality - look major distros. They seem to have out-of-tree patches to provide this functionality to their users, IMHO. Thanks, -- Jiri Kosina -
On Fri, 24 Aug 2007 02:09:59 +0200 (CEST) randomizing PIE's is as a whole worth getting right and in mainline. That means that ONLY the PIE text should be randomized, not that mmap should break ;) Randomizing address space is very widely recognized as being part of a whole set of things (and there's a lot of discussion about what that whole set should be, each vendor will say their solution should be part of that and that all others suck) that you need to do to make it a LOT harder to get a general purpose exploit working. (It's not fool proof; it's more comparable than a 4 tumble number lock than it is to a iris scan; yet even a tumble number lock makes it harder to break into your gym locker) -
Hi Andrew,
2.6.23-rc3-mm1 renders my machine unresponsive during bootup.
I've been observing this problem in 2.6.22-rc2-mm2 as well.
The boot log, the cpuinfo and .config have been appened at the
end
After embedding a few debug printk's in start_kernel, I noticed that
the last function to be executed was calibrate_delay()
Further, from the mm-bisect, the following patch turned out to be the
culprit:
x86_64-dynticks-disable-hpet_id_legsup-hpets.patch
From: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/i386/kernel/hpet.c | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)
diff -puN arch/i386/kernel/hpet.c~x86_64-dynticks-disable-hpet_id_legsup-hpets arch/i386/kernel/hpet.c
--- a/arch/i386/kernel/hpet.c~x86_64-dynticks-disable-hpet_id_legsup-hpets
+++ a/arch/i386/kernel/hpet.c
@@ -336,7 +336,7 @@ int __init hpet_enable(void)
clocksource_register(&clocksource_hpet);
- if (id & HPET_ID_LEGSUP) {
+ if (0 && (id & HPET_ID_LEGSUP)) {
hpet_enable_int();
hpet_reserve_platform_timers(id);
/*
Any particular reason for this patch [There is no changelog :-)]?
Without this patch the mm-kernel seems to behave just fine for me.
Thanks and Regards
gautham.
------------------------------------------------------------------------------
Linux version 2.6.23-rc3-mm1 (ego@llm43.in.ibm.com) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-52)) #18 SMP Thu Aug 23 15:54:03 IST 2007
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000100 - 000000000009a400 (usable)
BIOS-e820: 000000000009a400 - 00000000000a0000 (reserved)
BIOS-e820: 0000000000100000 - 00000000bffcba40 (usable)
BIOS-e820: 00000000bffcba40 - 00000000bffcee00 (ACPI data)
BIOS-e820: 00000000bffcee00 - 00000000c0000000 (reserved)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: ...On Thu, 23 Aug 2007 16:54:23 +0530 Oh damn. That patch is required to prevent a boot-time div-by-zero on my old nocona machine. I have a new set of x86_64-dynticks patches from Thomas to look at so I guess it's reset-and-start-again time on that front. -
That patch was necessary due to a bug in the hpet code, which is resolved for quite a while (hopefully). It was not a divide by zero, it Yep. I dropped that patch in the series. tglx -
OK, so I don't actually *use* the irDA on my laptop for much, but I figure if I have the hardware, I should at least make sure the driver comes up. 23-rc3-mm1 causes massive spewage, apparently at least partially as a fall-out of the sysctl rework. Not sure if those caused the 'unknown symbol' issues or not. The 'cannot allocate memory' is somewhat odd too, I can't believe it was *really* out of memory while still in /etc/rc5.d when I have 2G of ram... kernel: [ 247.804000] NET: Registered protocol family 23 kernel: [ 247.804000] sysctl table check failed: /net/irda .3.412 Unknown sysctl binary path kernel: [ 247.804000] sysctl table check failed: /net/irda/discovery .3.412.1 Unknown sysctl binary path kernel: [ 247.804000] sysctl table check failed: /net/irda/devname .3.412.2 Unknown sysctl binary path kernel: [ 247.804000] sysctl table check failed: /net/irda/debug .3.412.3 Unknown sysctl binary path kernel: [ 247.804000] sysctl table check failed: /net/irda/fast_poll_increase .3.412.4 Unknown sysctl binary path kernel: [ 247.804000] sysctl table check failed: /net/irda/discovery_slots .3.412.5 Unknown sysctl binary path kernel: [ 247.804000] sysctl table check failed: /net/irda/discovery_timeout .3.412.6 Unknown sysctl binary path kernel: [ 247.804000] sysctl table check failed: /net/irda/slot_timeout .3.412.7 Unknown sysctl binary path kernel: [ 247.804000] sysctl table check failed: /net/irda/max_baud_rate .3.412.8 Unknown sysctl binary path kernel: [ 247.804000] sysctl table check failed: /net/irda/min_tx_turn_time .3.412.9 Unknown sysctl binary path kernel: [ 247.804000] sysctl table check failed: /net/irda/max_tx_data_size .3.412.10 Unknown sysctl binary path kernel: [ 247.804000] sysctl table check failed: /net/irda/max_tx_window .3.412.11 Unknown sysctl binary path kernel: [ 247.804000] sysctl table check failed: /net/irda/max_noreply_time .3.412.12 Unknown sysctl binary path kernel: [ 247.804000] sysctl table check failed: /net/irda/warn_noreply_time ...
Eric is a bit sadistic with sysctls. If sysctl_check_table() fails irda_sysctl_register() will think there is no memory. As for unknown symbols, hell knows. Same as with multiple syscts errors. ... -
CONFIG_IRDA=m # # IrDA protocols # CONFIG_IRLAN=m CONFIG_IRNET=m CONFIG_IRCOMM=m CONFIG_IRDA_ULTRA=y # # IrDA options # CONFIG_IRDA_CACHE_LAST_LSAP=y CONFIG_IRDA_FAST_RR=y CONFIG_IRDA_DEBUG=y # # Infrared-port device drivers # # # SIR device drivers # CONFIG_IRTTY_SIR=m On my doesn't-complain -rc2-mm1 kernel: % lsmod | grep -i ir irnet 21409 0 ppp_generic 22177 1 irnet irtty_sir 8321 0 sir_dev 14473 1 irtty_sir ircomm_tty 35345 0 ircomm 20425 1 ircomm_tty irda 188973 5 irnet,irtty_sir,sir_dev,ircomm_tty,ircomm crc_ccitt 2817 1 irda My guess is that irda fails to insmod because of the sysctl errors, so when the other ir* modules try to load, they come up empty on symbols that irda provides.
On Thu, 23 Aug 2007 09:33:46 -0400 Cute. Eric, can you please suggest what we should do here? Yes, the ENOMEM is bogus. But irda_sysctl_register() saw a NULL return from register_sysctl_table() and simply has no clue why it failed, and is forced to assume ENOMEM. That's a design shortcoming in register_sysctl_table(), whcih should have returned an ERR_PTR. Doesn't matter much. -
Grumble. Ok. This is a two sided bug. The NET_IRDA define as not put in sysctl.h where it belongs so I missed it, when making the list of all existing binary sysctls. So really I need to put update the sysctl_check tables to have the NET_IRDA numbers, because at least at first skim everything looks ok on the binary side. Patches to follow shortly. Eric -
I should say something about the return value issue. Currently the only time this matters is when someone messes up in development, and if it isn't an out of memory error we get messages in dmesg so it shouldn't be to hard to sort out. I agree it is a bit of a short coming that we can only return NULL and it might be worth changing that at some point. Perhaps when I introduce register_sysctl_path would be a good time. Going through all of the callers just to give a better return value when they can't do anything about it anyway seems to be a lot of work for a very minor improvement. Eric -
Grumble. These numbers should have been in sysctl.h from the
beginning if we ever expected anyone to use them. Oh well put
them there now so we can find them and make maintenance easier.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
include/linux/sysctl.h | 20 ++++++++++++++++++++
net/irda/irsysctl.c | 34 ++++++++++++++--------------------
2 files changed, 34 insertions(+), 20 deletions(-)
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 88f0941..77c9ae2 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -238,6 +238,7 @@ enum
NET_LLC=18,
NET_NETFILTER=19,
NET_DCCP=20,
+ NET_IRDA=412,
};
/* /proc/sys/kernel/random */
@@ -795,6 +796,25 @@ enum {
NET_BRIDGE_NF_FILTER_PPPOE_TAGGED = 5,
};
+/* proc/sys/net/irda */
+enum {
+ NET_IRDA_DISCOVERY=1,
+ NET_IRDA_DEVNAME=2,
+ NET_IRDA_DEBUG=3,
+ NET_IRDA_FAST_POLL=4,
+ NET_IRDA_DISCOVERY_SLOTS=5,
+ NET_IRDA_DISCOVERY_TIMEOUT=6,
+ NET_IRDA_SLOT_TIMEOUT=7,
+ NET_IRDA_MAX_BAUD_RATE=8,
+ NET_IRDA_MIN_TX_TURN_TIME=9,
+ NET_IRDA_MAX_TX_DATA_SIZE=10,
+ NET_IRDA_MAX_TX_WINDOW=11,
+ NET_IRDA_MAX_NOREPLY_TIME=12,
+ NET_IRDA_WARN_NOREPLY_TIME=13,
+ NET_IRDA_LAP_KEEPALIVE_TIME=14,
+};
+
+
/* CTL_FS names: */
enum
{
diff --git a/net/irda/irsysctl.c b/net/irda/irsysctl.c
index 957e04f..525343a 100644
--- a/net/irda/irsysctl.c
+++ b/net/irda/irsysctl.c
@@ -31,12 +31,6 @@
#include <net/irda/irda.h> /* irda_debug */
#include <net/irda/irias_object.h>
-#define NET_IRDA 412 /* Random number */
-enum { DISCOVERY=1, DEVNAME, DEBUG, FAST_POLL, DISCOVERY_SLOTS,
- DISCOVERY_TIMEOUT, SLOT_TIMEOUT, MAX_BAUD_RATE, MIN_TX_TURN_TIME,
- MAX_TX_DATA_SIZE, MAX_TX_WINDOW, MAX_NOREPLY_TIME, WARN_NOREPLY_TIME,
- LAP_KEEPALIVE_TIME };
-
extern int sysctl_discovery;
extern int sysctl_discovery_slots;
extern int sysctl_discovery_timeout;
@@ -94,7 +88,7 @@ static int do_devname(ctl_table *table, int write, struct file *filp,
/* ...It turns out that the net/irda code didn't register any of
it's binary paths in the global sysctl.h header file so
I missed them completely when making an authoritative list
of binary sysctl paths in the kernel. So add them to
the list of valid binary sysctl paths.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
kernel/sysctl_check.c | 19 +++++++++++++++++++
1 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index d5e0337..aa5b6f6 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -702,6 +702,24 @@ static struct trans_ctl_table trans_net_dccp_table[] = {
{}
};
+static struct trans_ctl_table trans_net_irda_table[] = {
+ { NET_IRDA_DISCOVERY, "discovery" },
+ { NET_IRDA_DEVNAME, "devname" },
+ { NET_IRDA_DEBUG, "debug" },
+ { NET_IRDA_FAST_POLL, "fast_poll_increase" },
+ { NET_IRDA_DISCOVERY_SLOTS, "discovery_slots" },
+ { NET_IRDA_DISCOVERY_TIMEOUT, "discovery_timeout" },
+ { NET_IRDA_SLOT_TIMEOUT, "slot_timeout" },
+ { NET_IRDA_MAX_BAUD_RATE, "max_baud_rate" },
+ { NET_IRDA_MIN_TX_TURN_TIME, "min_tx_turn_time" },
+ { NET_IRDA_MAX_TX_DATA_SIZE, "max_tx_data_size" },
+ { NET_IRDA_MAX_TX_WINDOW, "max_tx_window" },
+ { NET_IRDA_MAX_NOREPLY_TIME, "max_noreply_time" },
+ { NET_IRDA_WARN_NOREPLY_TIME, "warn_noreply_time" },
+ { NET_IRDA_LAP_KEEPALIVE_TIME, "lap_keepalive_time" },
+ {}
+};
+
static struct trans_ctl_table trans_net_table[] = {
{ NET_CORE, "core", trans_net_core_table },
/* NET_ETHER not used */
@@ -723,6 +741,7 @@ static struct trans_ctl_table trans_net_table[] = {
{ NET_LLC, "llc", trans_net_llc_table },
{ NET_NETFILTER, "netfilter", trans_net_netfilter_table },
{ NET_DCCP, "dccp", trans_net_dccp_table },
+ { NET_IRDA, "irda", trans_net_irda_table },
{ 2089, "nf_conntrack_max" },
{}
};
--
1.5.1.1.181.g2de0
-
Applied both patches, and now all I get from irda at boot time now is this: [ 292.062000] irda_init() [ 292.063000] NET: Registered protocol family 23 [ 292.069000] IrCOMM protocol (Dag Brattli) [ 292.221000] PPP generic driver version 2.4.2 in other words, business as usual. Thanks. Feel free to stick this on both patches: Tested-By: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Thanks. It's good to have confirmation that my sysctl_check routine didn't find something else wrong. Eric -
If I understand the code, anything it whinges about is either an outright bug or it's a round of ammo already chambered. ;) As far as "something else wrong", I'm still seeing these in -rc3-mm1, but they've been reported before against -rc2-mm2, I think: [ 0.628000] sysctl table check failed: /kernel/ostype .1.1 Missing strategy [ 0.628000] sysctl table check failed: /kernel/osrelease .1.2 Missing strategy [ 0.628000] sysctl table check failed: /kernel/version .1.4 Missing strategy [ 0.628000] sysctl table check failed: /kernel/hostname .1.7 Missing strategy [ 0.628000] sysctl table check failed: /kernel/domainname .1.8 Missing strategy [ 0.628000] sysctl table check failed: /kernel/shmmax .1.34 Missing strategy [ 0.628000] sysctl table check failed: /kernel/shmall .1.41 Missing strategy [ 0.628000] sysctl table check failed: /kernel/shmmni .1.45 Missing strategy [ 0.628000] sysctl table check failed: /kernel/msgmax .1.35 Missing strategy [ 0.628000] sysctl table check failed: /kernel/msgmni .1.42 Missing strategy [ 0.628000] sysctl table check failed: /kernel/msgmnb .1.36 Missing strategy [ 0.628000] sysctl table check failed: /kernel/sem .1.43 Missing strategy And this isn't on an allyesconfig or allmodconfig. There may well be sysctl code I didn't hit - my /lib/modules/2.6.23-rc3-mm1 is only about 10M, and the Fedora kernels are weighing in at about 75M of /lib/modules a pop.
Interesting. No I haven't seen this one. This appears to be one of those silly little corner cases I failed to account for in my checks. It looks like you don't have CONFIG_SYSCTL_SYSCALL defined, and it appears utsname_syscall and ipcdata_syscall both become NULL pointers Yes. Thank you. I figure as long as we are reasonably close people we should catch most if not all things before this is merged into Linus's tree. Patch in a moment. Eric -
Yep. Nothing I actually use needs SYSCTL_SYSCALL, so I turned it off to see what breaks...
Other then glibc (which uses it to see if we are on a SMP system, and has a fallback to /proc/sys) I only found 5 other applications binaries when I was looking hard. Eric -
Currently sysctl_check_table will complain if a strategy routine is missing when we have sys_sysctl compiled out, or a proc_handler is missing when we have procfs compiled out. At least some of the custom handlers actually expand to NULL when this is the case so the warning is actually a problem. So don't worry about missing strategy routines, or missing proc_handler routines when they will never be called. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- kernel/sysctl_check.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c index aa5b6f6..10dd744 100644 --- a/kernel/sysctl_check.c +++ b/kernel/sysctl_check.c @@ -1552,14 +1552,18 @@ int sysctl_check_table(struct ctl_table *table) set_fail(&fail, table, "No max"); } } +#ifdef CONFIG_SYSCTL_SYSCALL if (table->ctl_name && !table->strategy) set_fail(&fail, table, "Missing strategy"); +#endif #if 0 if (!table->ctl_name && table->strategy) set_fail(&fail, table, "Strategy without ctl_name"); #endif +#ifdef CONFIG_PROC_FS if (table->procname && !table->proc_handler) set_fail(&fail, table, "No proc_handler"); +#endif #if 0 if (!table->procname && table->proc_handler) set_fail(&fail, table, "proc_handler without procname"); -- 1.5.3.rc6.17.g1911 -
I'm not seeing the false-positive msgs after applying this patch... Tested-By: Valdis Kletnieks <valdis.kletnieks@vt.edu>
3/2.6.23-rc3-mm1/ After applying Matthew Wilcox' patch to include/linux/isa.h this compiles= and boots on my Intel/openSUSE 10.2 test machine but throws out the following messages I don't remember ever seeing with other kernels: - on console early during boot, also in SuSE's /var/log/boot.msg: your system time is not correct: Wed Jul 13 13:15:31 UTC 1910 setting system time to: Tue Jul 24 00:00:00 UTC 2007 - later, dto. on console and in /var/log/boot.msg: FATAL: Error inserting processor (/lib/modules/2.6.23-rc3-mm1-testing/ker= nel/drivers/acpi/processor.ko): Input/output error WARNING: Error inserting processor (/lib/modules/2.6.23-rc3-mm1-testing/k= ernel/drivers/acpi/processor.ko): Input/output error FATAL: Error inserting thermal (/lib/modules/2.6.23-rc3-mm1-testing/kerne= l/drivers/acpi/thermal.ko): Unknown symbol in module, or unknown paramete= r (see dmesg) - apparently corresponding to that, in dmesg: <4>[ 7.691865] thermal: Unknown symbol acpi_processor_set_thermal_limi= t - from fsck during boot: /dev/system/root: Superblock last mount time is in the future. FIXED. /dev/system/root: Superblock last write time is in the future. FIXED. - in /var/log/warn: Aug 25 00:44:00 xenon powersaved[5356]: WARNING (CpufreqManagement:51) No= capability cpufreq_control Aug 25 00:44:00 xenon powersaved[5356]: WARNING (CpufreqManagement:51) No= capability cpufreq_control And the SUSE startup sequence displays "failed" for the acpid daemon. So it seems there is some strangeness wrt to system time and power management. I don't have the time to bisect this right now, but wanted to let you know anyway. Apart from that, the kernel seems to work fine. HTH --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
On Sat, 25 Aug 2007 01:27:25 +0200 What architecture? if x86_64 then perhaps something went wrong with the old x86_64 dynticks leftovers which were in rc3-mm1. I've just merged a shiny fresh new series so perhaps things there got fixed. Please retest next -mm. Dunno, there're some significant-looking cpufreq changes in there, such as cpufreq-allow-ondemand-and-conservative-cpufreq-governors-to-be-used-as-default.patch. Maybe we went and chose a different governor for you? Perhaps it would be helpful if you could do a diff -u dmesg-2.6.23-rc3 dmesg-2.6.26-rc3-mm1 OK, thanks. -
This is probably not related to cpufreq changes itself. Looks like it is Not sure why this is failing though. Don't recall any significant changes in processor.ko recently apart from CPUIDLE stuff. Thanks, Venki -
This is indeed related to CPUIDLE. Tilman: Can you configure CONFIG_CPU_IDLE in your config (under Power Management option) and double check that the frequency part works after that. Andrew: Adam Belay sent a recent patchset on linux-pm and linux-acpi and one of the patches of that addresses this issue (CPUIDLE: load ACPI properly when CPUIDLE is disabled). Those patches should come to mm soon through acpi-test. Thanks, Venki -
Strangely enough, I do not see that option in "make xconfig". The "Power Management" subtree ends with "CPU Frequency scaling". In "make menuconfig" the option is there, though. After activating it, these two errors are indeed gone, and the "thermal: Unknown symbol acpi_processor_set_thermal_limit" one as well. HTH T. --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
I'm sorry but I cannot reproduce the phenomenon anymore. After setting CONFIG_CPU_IDLE with "make menuconfig", when I ran "make xconfig" again it showed the option too. Moving .config.old back to .config doesn't make it disappear again. So it seems "make menuconfig" has elliminated whatever caused this. If you still want to have a look, both .config and .config.old are available at http://gollum.phnxsoft.com/~ts/2.6.23-rc3-mm1/ . --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
On Fri, Aug 24, 2007 at 05:07:02PM -0700, Andrew Morton wrote: > > FATAL: Error inserting processor (/lib/modules/2.6.23-rc3-mm1-testing/kernel/drivers/acpi/processor.ko): Input/output error > > WARNING: Error inserting processor (/lib/modules/2.6.23-rc3-mm1-testing/kernel/drivers/acpi/processor.ko): Input/output error > > > > Aug 25 00:44:00 xenon powersaved[5356]: WARNING (CpufreqManagement:51) No capability cpufreq_control > > Aug 25 00:44:00 xenon powersaved[5356]: WARNING (CpufreqManagement:51) No capability cpufreq_control > > Dunno, there're some significant-looking cpufreq changes in there, such as > cpufreq-allow-ondemand-and-conservative-cpufreq-governors-to-be-used-as-default.patch. > Maybe we went and chose a different governor for you? More likely, he was using a cpufreq driver that required acpi functionality, and because processor.ko went boom, the house of cards came tumbling down. I long for the olde days when acpi changes didn't end up with finger pointing at cpufreq. Dave -- http://www.codemonkey.org.uk -
Hrmm. I'm not super familiar w/ SuSE's init scripts, but I'm guessing that's the ntpdate call. And "Tuesday Jul 24th"? Sounds about a month Does this show up before or after the above date stuff? Does the issue go away using an older kernel (I want to eliminate easy stuff like CMOS batteries giving up)? Also you're not using Linus' CMOS corrupting suspend/resume debugging trick, right (I'm forgetting the CONFIG name). thanks -john -
Nope. The ntpdate call comes much later, and finally sets the system cloc= k After the "your system time is not correct" messages, and before the regular "Try to get initial date and time via NTP" message accompanying It does. Booting 2.6.23-rc3 after that, the system comes up with none PM_TRACE? No. The entire PM_DEBUG branch is turned off. --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
This is a multi-part message in MIME format. --------------060401020703050807050402 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable The one allowing drivers/scsi/advansys.c to compile with CONFIG_ISA=3Dn: Date: Wed, 22 Aug 2007 10:28:02 -0600 From: Matthew Wilcox <matthew@wil.cx> Subject: Re: drivers/scsi/advansys.c - ld error ( Re: 2.6.23-rc3-mm1 ) Message-ID: <20070822162802.GJ9163@parisc-linux.org> When CONFIG_ISA is disabled, the isa_driver support will not be compiled i386. The machine is a Pentium D 940 which would be x86_64 capable, I hope the attached helps. I created it by taking /var/log/boot.msg of the two systems, removing everything after "Kernel logging stopped", editing out the printk timestamps and then running diff -u on them, so it should be more or less the dmesg diff. I did not edit out any of the differences because I'm lazy. (And also because I wasn't so sure what would or wouldn't be interesting for you.) --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite) --------------060401020703050807050402 Content-Type: text/plain; name="bootmsg.diff" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline; filename="bootmsg.diff" --- /tmp/bootmsg-2.6.23-rc3 2007-08-25 02:25:54.000000000 +0200 +++ /tmp/bootmsg-2.6.23-rc3-mm1 2007-08-25 02:26:08.000000000 +0200 @@ -1,10 +1,10 @@ -Inspecting /boot/System.map-2.6.23-rc3-testing -Loaded 27522 symbols from /boot/System.map-2.6.23-rc3-testing. +Inspecting /boot/System.map-2.6.23-rc3-mm1-testing +Loaded 28663 symbols from /boot/System.map-2.6.23-rc3-mm1-testing. Symbols match kernel version 2.6.23. No module symbols loaded - kernel modules not enabled. =20 klogd 1.4.1, log source =3D ksyslog started. -<5>Linux version 2.6.23-rc3-testing (ts@xenon) (gcc version 4.1.2 200611= 15 (prerelease) ...
I wonder if that was supposed to happen. It's also happening in 2.6.23-rc3 base. I don't see anything there which would cause you to lose the clock setting, but there are obviously a few things going wrong in the time-management area here. Please explicity retest this stuff as the code evolves and kepp us informed of the problems. Thanks. -
On Fri, Aug 24, 2007 at 08:30:00PM -0700, Andrew Morton wrote: > > <6>Linux agpgart interface v0.102 > > +<6>rtc_cmos 00:03: rtc core: registered rtc_cmos as rtc0 > > +<4>rtc_cmos: probe of 00:03 failed with error -16 > > +<6>agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup X.Org > > +<4>on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c) > > <6>agpgart: Detected an Intel 965Q Chipset. > > <6>agpgart: Unknown page table size, assuming 512KB > > <6>agpgart: Detected 7676K stolen memory. > > <6>agpgart: AGP aperture is 256M @ 0x40000000 > > -<6>rtc_cmos 00:03: rtc core: registered rtc_cmos as rtc0 > > -<4>rtc_cmos: probe of 00:03 failed with error -16 > > I wonder if that was supposed to happen. It's also happening in 2.6.23-rc3 > base. EBUSY. I've seen this happen when you have both CONFIG_RTC and CONFIG_RTC_DRV_CMOS set. Dave -- http://www.codemonkey.org.uk -
Hello, On Sat, 25 Aug 2007 00:28:09 -0400 This one is becoming quite worth an entry in a FAQ, it pops up one every month ;) There was a discussion about preventing both being set at the same time when configuring, but I don't remember how it ends... Paul -
I must have missed that discussion. I have: CONFIG_RTC=3Dy CONFIG_RTC_DRV_CMOS=3Dm because both of these options claim in their help texts that you should select them if you want to access the PC RTC. --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
With 2.6.23-rc4-mm1 this doesn't happen anymore, This still happens identically with 2.6.23-rc4-mm1. 2.6.23-rc4-mm1 reverts to mainline behaviour here. (ie. "busy" instead of "no address or irqs") Gone in 2.6.23-rc4-mm1. HTH Tilman --=20 Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Unge=F6ffnet mindestens haltbar bis: (siehe R=FCckseite)
Hi,
I've found a regression against 2.6.23-rc2-mm2. X server shutdown freezes
(untainted) kernel hardly. Nothing on netconsole, X output follows:
X Window System Version 1.3.0
Release Date: 19 April 2007
X Protocol Version 11, Revision 0, Release 1.3
Build Operating System: Fedora Core 7 Red Hat, Inc.
Current Operating System: Linux bellona 2.6.23-rc3-mm1 #315 SMP Wed Aug 22
21:43:06 CEST 2007 i686
Build Date: 11 June 2007
Build ID: xorg-x11-server 1.3.0.0-9.fc7
Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
Module Loader present
Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Sun Aug 26 14:22:43 2007
(==) Using config file: "/etc/X11/xorg.conf"
(WW) RADEON: No matching Device section for instance (BusID PCI:1:0:1) found
(**) RADEON(0): RADEONPreInit
(II) Module already built-in
(II) Module already built-in
(II) Module already built-in
(**) RADEON(0): RADEONScreenInit f0000000 0
(**) RADEON(0): Map: 0xf0000000, 0x04000000
(**) RADEON(0): RADEONSave
(**) RADEON(0): RADEONSaveMode(0x8240870)
(**) RADEON(0): Read: 0x0000000c 0x00030065 0x00000000
(**) RADEON(0): Read: rd=12, fd=101, pd=3
(**) RADEON(0): RADEONSaveMode returns 0x8240870
(**) RADEON(0): DRI New memory map param
(**) RADEON(0): RADEONInitMemoryMap() :
(**) RADEON(0): mem_size : 0x04000000
(**) RADEON(0): MC_FB_LOCATION : 0xf3fff000
(**) RADEON(0): MC_AGP_LOCATION : 0xffffffc0
(**) RADEON(0): RADEONModeInit()
1280x1024 108.00 1280 1328 1440 1688 1024 1025 1028 1066 (24,32) +H +V
1280x1024 108.00 1280 1328 1440 1688 1024 1025 1028 1066 (24,32) +H +V
(**) RADEON(0): Pitch = 10485920 bytes (virtualX = 1280, displayWidth = 1280)
(**) RADEON(0): dc=10800, of=21600, fd=96, pd=2
(**) RADEON(0): RADEONInit returns ...Does this went through to your boxes? Any progress, clue, idea? -- Jiri Slaby (jirislaby@gmail.com) Faculty of Informatics, Masaryk University -
Also intel on integrated i915 causes this (NoAccell has no effect in this case). Note that also 2.6.23-rc4-mm1 is affected by this behaviour. I have a trace for you: http://www.fi.muni.cz/~xslaby/sklad/panics/x-freeze.png (this is the only what I'm able to grab so far) regards, -- Jiri Slaby (jirislaby@gmail.com) Faculty of Informatics, Masaryk University -
afacit everything on that call trace is good. I guess it's possible that one of the higher-level loops has gone infinite (eg, the one in agp_remove_controller()). Are you able to get netconsole working, and run sysrq-P and sysrq-T ten or so times, see if it's always stuck in the same place on the same CPU? -
Removed gareth@valinux.com (dead e-mail) --------------------------------^^^ sorry, both netconsole and my usb devices (including my keyboard -- no numlock There is no loop in this function. Anyway, going to track this whole issue down. thanks so far, -- Jiri Slaby (jirislaby@gmail.com) Faculty of Informatics, Masaryk University -
Hm, I suspect Andi's x86_64-mm-cpa-clflush.patch or something like that. It loops in flush_kernel_map in list_for_each_entry on the first CPU. The a->l list is somehow corrupted I guess. regards, -- Jiri Slaby (jirislaby@gmail.com) Faculty of Informatics, Masaryk University -
Does it still happen with the latest version ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/cpa-clflush ? (you might need to replace the other cpa-* patches too because they depend on each other) -Andi -
I think so :) $ diff -u x86_64-mm-cpa-clflush.patch cpa-clflush |wc -l And are there any changes against the -rc4-mm1 in those patches? BTW it is reproducible for me on two different machines (i386-x86_64, radeon-intel), don't you have the problem too? regards, -- Jiri Slaby (jirislaby@gmail.com) Faculty of Informatics, Masaryk University -
No problems here with a radeon, no. Does your CPU have clflush or not in /proc/cpuinfo? -Andi -
yes: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz stepping : 11 cpu MHz : 2992.543 cache size : 4096 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr lahf_lm bogomips : 5991.99 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz stepping : 11 cpu MHz : 2992.543 cache size : 4096 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr lahf_lm bogomips : 5985.42 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: -- Jiri Slaby (jirislaby@gmail.com) Faculty of Informatics, Masaryk University -
BTW this is how my flush_kernel_map looks like:
static void flush_kernel_map(void *arg)
{
struct flush_arg *a = (struct flush_arg *)arg;
struct page *pg;
unsigned int xx = 0;
/* When clflush is available use it because it is
much cheaper than WBINVD. */
printk("%s: 1\n", __func__);
if (a->full_flush || !cpu_has_clflush)
asm volatile("wbinvd" ::: "memory");
else list_for_each_entry(pg, &a->l, lru) {
printk("%s: %10u 1a\n", __func__, xx++);
if (PageFlush(pg))
clflush_cache_range(page_address(pg), PAGE_SIZE);
}
printk("%s: 2\n", __func__);
__flush_tlb_all();
printk("%s: 3\n", __func__);
}
It outputs 1a in the infinite loop with incrementing xx. But only in this case,
some global_flush_tlb are OK. e.g.:
Sep 10 01:39:19 localhost kernel: agpgart: Detected an Intel G33 Chipset.
Sep 10 01:39:19 localhost kernel: global_flush_tlb: 1
Sep 10 01:39:19 localhost kernel: flush_kernel_map: 1
Sep 10 01:39:19 localhost kernel: flush_kernel_map: 0 1a
Sep 10 01:39:19 localhost kernel: flush_kernel_map: 1 1a
Sep 10 01:39:19 localhost kernel: flush_kernel_map: 2
Sep 10 01:39:19 localhost kernel: flush_kernel_map: 3
Sep 10 01:39:19 localhost kernel: flush_kernel_map: 1
Sep 10 01:39:19 localhost kernel: flush_kernel_map: 0 1a
Sep 10 01:39:19 localhost kernel: flush_kernel_map: 1 1a
Sep 10 01:39:19 localhost kernel: flush_kernel_map: 2
Sep 10 01:39:19 localhost kernel: flush_kernel_map: 3
Sep 10 01:39:19 localhost kernel: global_flush_tlb: 2
Sep 10 01:39:19 localhost kernel: global_flush_tlb: 3
Sep 10 01:39:19 localhost kernel: agpgart: Detected 6140K stolen memory.
It seems, that the list is broken only on X shutdown. How can be deferred-pages
list inited in some bad manner, when list_replace_init is called on it? Weird.
--
Jiri Slaby (jirislaby@gmail.com)
Faculty of Informatics, ...Can you stick a printk into change_page_attr to log in which order it changes pages (including their addresses and the pgattr)? -Andi -
printk("%s: %p %p %.16lx %d\n", __func__, page, page_address(page), pgprot_val(prot), numpages); http://www.fi.muni.cz/~xslaby/sklad/panics/x-freeze_chpa.png What other info needed? -- Jiri Slaby (jirislaby@gmail.com) Faculty of Informatics, Masaryk University -
I'm seeing this on my 965gm chipset with Andi's clflush patches on x86 32-bit, it looks like an interaction with the agp code which does a big bunch of change page attr to allocate the AGP aperture backed memory.. I think the code might have worked in a previous iteration on my 64-bit 965G machine at home but I'm on the road and won't be back anytime soon.. I'll see what I can figure out from my laptop... Dave. -
Ok, here comes a BUG with trace: set status page addr 0x00033000 list_add corruption. next->prev should be prev (ffffffff8068ae70), but was ffffffff80697a50. (next=ffff81000117fbd0). ------------[ cut here ]------------ kernel BUG at /home/l/latest/xxx/lib/list_debug.c:27! invalid opcode: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:02.0/enable CPU 0 Modules linked in: ipv6 floppy sr_mod rtc_cmos rtc_core cdrom ehci_hcd rtc_lib usbhid Pid: 1639, comm: X Not tainted 2.6.23-rc4-mm1_64 #23 RIP: 0010:[<ffffffff80332f49>] [<ffffffff80332f49>] __list_add+0x39/0x60 RSP: 0018:ffff81000547bd48 EFLAGS: 00010296 RAX: 0000000000000079 RBX: ffff81000117a380 RCX: ffffffff8068b450 RDX: ffff81000317d6a0 RSI: 0000000000000001 RDI: ffffffff8068b420 RBP: ffff81000547bd48 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000006da2 R13: ffff810006c10d10 R14: ffff810006da2000 R15: 8000000000000163 FS: 00007f7a05258710(0000) GS:ffffffff806d1000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000a33040 CR3: 0000000004a43000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process X (pid: 1639, threadinfo ffff81000547a000, task ffff81000317d6a0) Stack: ffff81000547bd58 ffffffff80332f7c ffff81000547bdc8 ffffffff80225c56 ffffffff80225cb5 8000000000000163 ffffffff806830e8 ffffffff806830a0 ffffffff806830a0 ffff810006da2000 ffff81000547bda8 0000000000006da2 Call Trace: [<ffffffff80332f7c>] list_add+0xc/0x10 [<ffffffff80225c56>] __change_page_attr+0x376/0x390 [<ffffffff80225cb5>] change_page_attr_addr+0x45/0x140 [<ffffffff80225d16>] change_page_attr_addr+0xa6/0x140 [<ffffffff80225de3>] change_page_attr+0x33/0x40 [<ffffffff80387b64>] agp_generic_destroy_page+0x44/0x70 [<ffffffff80388645>] agp_free_memory+0x65/0xd0 [<ffffffff80386d49>] agp_free_memory_wrap+0x39/0x60 ...
"struct menu_governor" needlessly again became global.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
---
cb33b296204127cf50df54b84b2d79e152fb924b
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index f5a8865..8d3fdc5 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -117,7 +117,7 @@ static int menu_enable_device(struct cpuidle_device *dev)
return 0;
}
-struct cpuidle_governor menu_governor = {
+static struct cpuidle_governor menu_governor = {
.name = "menu",
.rating = 20,
.enable = menu_enable_device,
-
This is already fixed in the most recent ACPI CPUIDLE tree. Thanks, Adam -
parport_device_num() is no longer used. Signed-off-by: Adrian Bunk <bunk@kernel.org> --- Documentation/parport-lowlevel.txt | 29 +++-------------------------- drivers/parport/daisy.c | 29 ----------------------------- include/linux/parport.h | 1 - 3 files changed, 3 insertions(+), 56 deletions(-) 0066510df2b5d4972cfd6a4450af8b82c763adfd diff --git a/Documentation/parport-lowlevel.txt b/Documentation/parport-lowlevel.txt index 8f23024..265fcdc 100644 --- a/Documentation/parport-lowlevel.txt +++ b/Documentation/parport-lowlevel.txt @@ -25,7 +25,6 @@ Global functions: parport_open parport_close parport_device_id - parport_device_num parport_device_coords parport_find_class parport_find_device @@ -735,7 +734,7 @@ NULL is returned. SEE ALSO -parport_register_device, parport_device_num +parport_register_device parport_close - unregister device for particular device number ------------- @@ -787,29 +786,7 @@ Many devices have ill-formed IEEE 1284 Device IDs. SEE ALSO -parport_find_class, parport_find_device, parport_device_num - -parport_device_num - convert device coordinates to device number ------------------- - -SYNOPSIS - -#include <linux/parport.h> - -int parport_device_num (int parport, int mux, int daisy); - -DESCRIPTION - -Convert between device coordinates (port, multiplexor, daisy chain -address) and device number (zero-based). - -RETURN VALUE - -Device number, or -1 if no device at given coordinates. - -SEE ALSO - -parport_device_coords, parport_open, parport_device_id +parport_find_class, parport_find_device parport_device_coords - convert device number to device coordinates ------------------ @@ -833,7 +810,7 @@ Zero on success, in which case the coordinates are (*parport, *mux, SEE ALSO -parport_device_num, parport_open, parport_device_id +parport_open, parport_device_id parport_find_class - find a device by its class ------------------ diff ...
do_restart_poll() can become static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
---
59cd2d11f5f0189973bb280c59262eb50984cb88
diff --git a/fs/select.c b/fs/select.c
index 5a3ab01..3e515aa 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -711,7 +711,7 @@ out_fds:
return err;
}
-long do_restart_poll(struct restart_block *restart_block)
+static long do_restart_poll(struct restart_block *restart_block)
{
struct pollfd __user *ufds = (struct pollfd __user*)restart_block->arg0;
int nfds = restart_block->arg1;
-
snd_ctl_elem_{read,write} no longer have any modular users.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
---
sound/core/control.c | 4 ----
1 file changed, 4 deletions(-)
23e15051dde57c569e4c9aff1339aaf64185ea71
diff --git a/sound/core/control.c b/sound/core/control.c
index 396e98e..6144d8a 100644
--- a/sound/core/control.c
+++ b/sound/core/control.c
@@ -716,8 +716,6 @@ int snd_ctl_elem_read(struct snd_card *card, struct snd_ctl_elem_value *control)
return result;
}
-EXPORT_SYMBOL(snd_ctl_elem_read);
-
static int snd_ctl_elem_read_user(struct snd_card *card,
struct snd_ctl_elem_value __user *_control)
{
@@ -781,8 +779,6 @@ int snd_ctl_elem_write(struct snd_card *card, struct snd_ctl_file *file,
return result;
}
-EXPORT_SYMBOL(snd_ctl_elem_write);
-
static int snd_ctl_elem_write_user(struct snd_ctl_file *file,
struct snd_ctl_elem_value __user *_control)
{
-
sys_{open,read} can finally be unexported.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
---
fs/open.c | 1 -
fs/read_write.c | 1 -
2 files changed, 2 deletions(-)
6f6884f9ee675f2e804c6c58ca46337f9765dd0d
diff --git a/fs/open.c b/fs/open.c
index 23f334d..c0814de 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1057,7 +1057,6 @@ asmlinkage long sys_open(const char __user *filename, int flags, int mode)
prevent_tail_call(ret);
return ret;
}
-EXPORT_SYMBOL_GPL(sys_open);
asmlinkage long sys_openat(int dfd, const char __user *filename, int flags,
int mode)
diff --git a/fs/read_write.c b/fs/read_write.c
index 507ddff..d913d1e 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -370,7 +370,6 @@ asmlinkage ssize_t sys_read(unsigned int fd, char __user * buf, size_t count)
return ret;
}
-EXPORT_SYMBOL_GPL(sys_read);
asmlinkage ssize_t sys_write(unsigned int fd, const char __user * buf, size_t count)
{
-
On Mon, 27 Aug 2007 23:27:23 +0200 isn't sys_close in the same boat? -
Still used in fs/binfmt_misc.c ...
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
-
<-- snip -->
...
AS arch/m32r/kernel/entry.o
/home/bunk/linux/kernel-2.6/linux-2.6.23-rc3-mm1/arch/m32r/kernel/entry.S: Assembler messages:
/home/bunk/linux/kernel-2.6/linux-2.6.23-rc3-mm1/arch/m32r/kernel/entry.S:358: Error: bad instruction `addi r0,#(((((0)+(64))+(32))+(32)))'
make[2]: *** [arch/m32r/kernel/entry.o] Error 1
<-- snip -->
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
-
From: Adrian Bunk <bunk@kernel.org> Subject: 2.6.23-rc3-mm1: m32r defconfig compile error Hello, Adrian, Thank you for the report. M32700UT/OPSPUT Users, Please apply this patch to build an m32r 2.6.23-rc3-mm1 kernel. This patch has also included to my m32r kernel development git repository. git://www.linux-m32r.org/git/takata/linux-2.6_dev.git linux-m32r Thanks, -- Takata [PATCH 2.6.23-rc3-mm1] m32r: build fix of entry.S This patch is required to fix build errors for the modification: m32r: Simplify ei_handler code commit f6c7546d53a4288501dcdd96d5297214697e7237 Signed-off-by: Hirokazu Takata <takata@linux-m32r.org> --- arch/m32r/kernel/entry.S | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/m32r/kernel/entry.S b/arch/m32r/kernel/entry.S index c46cfaa..42b08bf 100644 --- a/arch/m32r/kernel/entry.S +++ b/arch/m32r/kernel/entry.S @@ -355,7 +355,7 @@ ENTRY(ei_handler) lduh r0, @(low(M32R_INT0ICU_ISTS),r0) ; bit10-6 : ISN slli r0, #21 srli r0, #27 ; ISN - addi r0, #(M32R_INT0ICU_IRQ_BASE) + add3 r0, r0, #(M32R_INT0ICU_IRQ_BASE) bra check_end .fillinsn 4: @@ -367,7 +367,7 @@ ENTRY(ei_handler) lduh r0, @(low(M32R_INT2ICU_ISTS),r0) ; bit10-6 : ISN slli r0, #21 srli r0, #27 ; ISN - addi r0, #(M32R_INT2ICU_IRQ_BASE) + add3 r0, r0, #(M32R_INT2ICU_IRQ_BASE) ; bra check_end .fillinsn 5: -- 1.5.2.4 -- Hirokazu Takata <takata@linux-m32r.org> Linux/M32R Project: http://www.linux-m32r.org/ -
This patch removes the unused unwind exports.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
---
kernel/unwind.c | 4 ----
1 file changed, 4 deletions(-)
844ccf670a8204df45b89407bb0e5867f03d0f71
diff --git a/kernel/unwind.c b/kernel/unwind.c
index 8c267c7..adb1ebe 100644
--- a/kernel/unwind.c
+++ b/kernel/unwind.c
@@ -1243,7 +1243,6 @@ int unwind(struct unwind_frame_info *frame)
#undef CASES
#undef FRAME_REG
}
-EXPORT_SYMBOL(unwind);
int unwind_init_frame_info(struct unwind_frame_info *info,
struct task_struct *tsk,
@@ -1255,7 +1254,6 @@ int unwind_init_frame_info(struct unwind_frame_info *info,
return 0;
}
-EXPORT_SYMBOL(unwind_init_frame_info);
/*
* Prepare to unwind a blocked task.
@@ -1269,7 +1267,6 @@ int unwind_init_blocked(struct unwind_frame_info *info,
return 0;
}
-EXPORT_SYMBOL(unwind_init_blocked);
/*
* Prepare to unwind the currently running thread.
@@ -1284,5 +1281,4 @@ int unwind_init_running(struct unwind_frame_info *info,
return arch_unwind_init_running(info, callback, arg);
}
-EXPORT_SYMBOL(unwind_init_running);
-
noautodma can now be unexported. Signed-off-by: Adrian Bunk <bunk@kernel.org> --- 957dc7601c050cb14a7afc842db0c2d62aaf3509 diff --git a/drivers/ide/ide.c b/drivers/ide/ide.c index b3b5f00..5b09066 100644 --- a/drivers/ide/ide.c +++ b/drivers/ide/ide.c @@ -100,8 +100,6 @@ static int ide_scan_direction; /* THIS was formerly 2.2.x pci=reverse */ int noautodma = 0; -EXPORT_SYMBOL(noautodma); - #ifdef CONFIG_BLK_DEV_IDEACPI int ide_noacpi = 0; int ide_noacpitfs = 1; -
This patch fixes an obvious bug in mixdev_open_devices().
Signed-off-by: Adrian Bunk <bunk@kernel.org>
---
bb574366744163ff84609843ee43e84a39f57d5a
diff --git a/drivers/input/mousedev.c b/drivers/input/mousedev.c
index 2ad8633..4ca0ad3 100644
--- a/drivers/input/mousedev.c
+++ b/drivers/input/mousedev.c
@@ -461,7 +461,7 @@ static void mixdev_open_devices(void)
list_for_each_entry(mousedev, &mousedev_mix_list, mixdev_node) {
if (!mousedev->mixdev_open) {
- if (mousedev_open_device(mousedev));
+ if (mousedev_open_device(mousedev))
continue;
mousedev->mixdev_open = 1;
-
This patch fixes an obvious bug in ivtvfb_release_buffers(). Signed-off-by: Adrian Bunk <bunk@kernel.org> --- 093bdc9ba94bffbec2ed44743418899771488832 diff --git a/drivers/media/video/ivtv/ivtv-fb.c b/drivers/media/video/ivtv/ivtv-fb.c index 0080765..8a344d5 100644 --- a/drivers/media/video/ivtv/ivtv-fb.c +++ b/drivers/media/video/ivtv/ivtv-fb.c @@ -1068,8 +1068,8 @@ static void ivtvfb_release_buffers (struct ivtv *itv) struct osd_info *oi = itv->osd_info; /* Release cmap */ - if (oi->ivtvfb_info.cmap.len); - fb_dealloc_cmap(&oi->ivtvfb_info.cmap); + if (oi->ivtvfb_info.cmap.len) + fb_dealloc_cmap(&oi->ivtvfb_info.cmap); /* Release pseudo palette */ if (oi->ivtvfb_info.pseudo_palette) -
Ouch. Thanks. Mauro, I've added this patch to my v4l-dvb tree. Can you pull from it? Thanks, -
This patch fixes two obvious bugs. Signed-off-by: Adrian Bunk <bunk@kernel.org> --- drivers/net/wireless/iwl-base.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) 600ffdc11b25ac0aee15271d1b2ce99a367efa31 diff --git a/drivers/net/wireless/iwl-base.c b/drivers/net/wireless/iwl-base.c index b8293fe..f65c30e 100644 --- a/drivers/net/wireless/iwl-base.c +++ b/drivers/net/wireless/iwl-base.c @@ -343,7 +343,7 @@ int iwl_tx_queue_init(struct iwl_priv *priv, * command is very huge the system will not have two scan at the * same time */ len = sizeof(struct iwl_cmd) * slots_num; - if (txq_id == IWL_CMD_QUEUE_NUM); + if (txq_id == IWL_CMD_QUEUE_NUM) len += IWL_MAX_SCAN_SIZE; txq->cmd = pci_alloc_consistent(dev, len, &txq->dma_addr_cmd); if (!txq->cmd) @@ -390,7 +390,7 @@ void iwl_tx_queue_free(struct iwl_priv *priv, struct iwl_tx_queue *txq) iwl_hw_txq_free_tfd(priv, txq); len = sizeof(struct iwl_cmd) * q->n_window; - if (q->id == IWL_CMD_QUEUE_NUM); + if (q->id == IWL_CMD_QUEUE_NUM) len += IWL_MAX_SCAN_SIZE; pci_free_consistent(dev, len, txq->cmd, txq->dma_addr_cmd); -
Shame on me. -
-maccumulate-outgoing-args on i386 count:
1 + 1 - 1 = 1
If the stack unwinder needs it please enable it only explicitely when
the unwinder is enabled - we are talking about a 2.5% size increase
with defconfig.
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
-
I got this during gxine initialization of ocko.tv live stream without any cd in
cdroms:
BUG: unable to handle kernel NULL pointer dereference at virtual address 0000005c
printing eip: f88fbe7a *pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in: ath5k arc4 ecb blkcipher cryptomgr crypto_algapi
rc80211_simple mac80211 cfg80211 nls_cp437 vfat fat usb_storage tun ipv6 floppy
parport_pc parport ohci1394 ieee1394 usbhid sr_mod ehci_hcd cdrom ff_memless
Pid: 2809, comm: hald-addon-stor Not tainted (2.6.23-rc3-mm1 #315)
EIP: 0060:[<f88fbe7a>] EFLAGS: 00010246 CPU: 1
EIP is at sr_block_release+0xb/0x2c [sr_mod]
EAX: 00000000 EBX: 00000000 ECX: f88fbe6f EDX: 00000000
ESI: c21c36c0 EDI: c289a780 EBP: c3729f18 ESP: c3729f10
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process hald-addon-stor (pid: 2809, ti=c3729000 task=c1c2be40 task.ti=c3729000)
Stack: 00000000 c21c36c0 c3729f38 c018d7ad c21c36cc c1f9ff80 c21c3730 c21c36c0
c2a6ada0 dcbb3f80 c3729f40 c018d7dc c3729f4c c018e103 00000010 c3729f74
c016bc5f 00000000 00000000 c217fa80 c1f9ff80 c2a6ada0 dcbb3f80 c1cc6900
Call Trace:
[<c0105022>] show_trace_log_lvl+0x1a/0x30
[<c01050dd>] show_stack_log_lvl+0xa5/0xca
[<c01051d2>] show_registers+0xd0/0x1c1
[<c01053cd>] die+0x10a/0x24d
[<c011afbe>] do_page_fault+0x496/0x608
[<c03768e2>] error_code+0x72/0x78
[<c018d7ad>] __blkdev_put+0x125/0x14a
[<c018d7dc>] blkdev_put+0xa/0xc
[<c018e103>] blkdev_close+0x29/0x2c
[<c016bc5f>] __fput+0xa6/0x161
[<c016bea4>] fput+0x22/0x3b
[<c016960b>] filp_close+0x41/0x67
[<c016a78c>] sys_close+0x60/0x9f
[<c01040ce>] syscall_call+0x7/0xb
=======================
Code: 0c 81 c3 4c 01 00 00 89 5c 24 08 89 44 24 04 c7 04 24 88 cd 8f f8 e8 99 84
82 c7 e9 04 fe ff ff 55 89 e5 56 53 8b 80 04 01 00 00 <8b> 40 5c 8b 70 3c 8d 46
18 e8 cf f6 fe ff 89 c3 85 c0 75 07 89
EIP: [<f88fbe7a>] sr_block_release+0xb/0x2c [sr_mod] SS:ESP 0068:c3729f10
regards,
--
Jiri Slaby (jirislaby@gmail.com)
Faculty of Informatics, Masaryk University
-
Hi Jiri, Yup, that's an old habit of hald-addon-storage ... doing open(2), ioctl(2) and close(2) on the cdrom block device even when it's idle, blkdev_put(bdev, ...) __blkdev_put(bdev, ...) sr_block_release(bdev->bd_inode, ...) (sees bdev->bd_inode->i_bdev->bd_disk == NULL) Doesn't seem like an sr_block_release() (or sr_mod) issue to me at all, looks more like a wierd race in the block_device code ... can you send or put up some link to your .config? If this is reproducible (I bet it isn't, though) you could try bisecting. Satyam -
-- Jiri Slaby (jirislaby@gmail.com) Faculty of Informatics, Masaryk University -
Possibly due to remove-bdput-from-do_open-in-fs-block_devc.patch. That patch is "wrong" and I think the problem which it attempts to address actually lies in the cdrom code. viro was taking a look at it but appears to have recoiled in horror. I'll drop remove-bdput-from-do_open-in-fs-block_devc.patch so let's just watch out for any reoccurrence, thanks. -
Sorry for not catching this one sooner, but AFAICT, Fedora didn't ship a glibc
that trips over this (2.6.90-12) until Saturday and I installed it yesterday.
-22-rc6-mm1 demonstrated the same issue as well.
The issue: vdso and gettimeofday seem to be having a quarrel.
At boot, the system clock is just fine. Right when it hits the 5-minute
uptime mark (and suspiciously close to the jiffie rollover), the date suddenly
shoots forward 4096 seconds.
Dumb test script:
#!/bin/bash
log="uptime.`uname -r`"
touch /root/$log
tail -f /root/$log &
while /bin/true;
do
uptime >> /root/$log
date >> /root/$log
sleep 1
done
Exerpt from runtime:
19:57:55 up 1 min, 0 users, load average: 0.09, 0.05, 0.01
Tue Aug 28 19:57:55 EDT 2007
19:57:56 up 1 min, 0 users, load average: 0.09, 0.05, 0.01
Tue Aug 28 19:57:56 EDT 2007
19:57:57 up 2 min, 0 users, load average: 0.08, 0.05, 0.01
Tue Aug 28 19:57:57 EDT 2007
19:57:58 up 2 min, 0 users, load average: 0.08, 0.05, 0.01
...
20:00:55 up 4 min, 0 users, load average: 0.01, 0.03, 0.00
Tue Aug 28 20:00:55 EDT 2007
20:00:56 up 4 min, 0 users, load average: 0.01, 0.03, 0.00
Tue Aug 28 20:00:56 EDT 2007
20:00:57 up 5 min, 0 users, load average: 0.01, 0.03, 0.00
Tue Aug 28 21:09:15 EDT 2007
20:00:58 up 5 min, 0 users, load average: 0.01, 0.03, 0.00
Tue Aug 28 21:09:16 EDT 2007
20:00:59 up 5 min, 0 users, load average: 0.01, 0.03, 0.00
Tue Aug 28 21:09:17 EDT 2007
uptime keeps reporting the right time, date goes flying ahead. Once we
get into this state, I can issue a 'date' command to set the *right* time,
and then it will immediately reset back. Doing a 'touch foo; ls -l foo'
shows the correct time.
Booting with vdso=0 makes the time/date run normally.
Ideas?
On Wed, 29 Aug 2007 10:04:33 EDT, Valdis.Kletnieks@vt.edu said: This is also open as a Fedora bug: https://bugzilla.redhat.com/show_bug.cgi?id=262481
So it's an interaction between the x86_64 vdso patches in Andi's tree and newer glibc, and we don't know which one is getting it wrong yet? If I ever get another -mm out the door (have been without electricity for several days) I'll drop the vdso changes until this is sorted out. -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 glibc does nothing but call the code in the vdso. We have a function pointer variable which either has the old vsyscall value or the address of the function in the vdso. Everything else is identical. Unless the interface of the vdso function is different (which it shouldn't) I don't think you can blame glibc. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFG1i+K2ijCOnn/RHQRArogAKC3zBeyOzqJRF+x2zj3fBg9iGLdyQCgx9Z3 dv3Izh65+kxKedza6RH3MHk= =qEdC -----END PGP SIGNATURE----- -
Don't bother, I tested this last night against a vanilla 2.6.23-rc3 kernel and it had the same issue as well. So Andi's vdso patches in his tree and/or the -mm kernel aren't to blame - it's in mainline as well. And it's been in for a while - I also hit it with a 2.6.22-rc6-mm1 kernel.
We also have: http://lkml.org/lkml/2007/7/29/376 (Time repeatedly jumps backwards ~4400 seconds.) -
Problem is present in stock 2.6.23-rc too. Still don't know whether it is the new glibc code or the vdso that's causing it, though. -
Updating on this issue: Both myself and another person have reported on the RedHat bugzilla that it's a clocksource issue - if you are using the hpet clocksource, the time warps, but booting with clocksource=acpi_pm works. This ring any bells?
Does this patch fix it?
-Andi
Add missing mask operation to vdso
vdso vgetns() didn't mask the time source offset calculation, which could
lead to time problems with 32bit HPET. Add the masking.
Thanks to Chuck Ebbert for tracking down.
Signed-off-by: Andi Kleen <ak@suse.de>
Index: linux/arch/x86_64/vdso/vclock_gettime.c
===================================================================
--- linux.orig/arch/x86_64/vdso/vclock_gettime.c
+++ linux/arch/x86_64/vdso/vclock_gettime.c
@@ -34,10 +34,11 @@ static long vdso_fallback_gettime(long c
static inline long vgetns(void)
{
+ long v;
cycles_t (*vread)(void);
vread = gtod->clock.vread;
- return ((vread() - gtod->clock.cycle_last) * gtod->clock.mult) >>
- gtod->clock.shift;
+ v = (vread() - gtod->clock.cycle_last) & gtod->clock.mask;
+ return (v * gtod->clock.mult) >> gtod->clock.shift;
}
static noinline int do_realtime(struct timespec *ts)
-
Confirming that does indeed fix it - booted with hpet clocksource and vdso=1, and the time didn't warp at the 5-minute mark. Tested-By: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Just found this duplicated code in 2.6.23-rc4, maybe it was supposed
to be something else? The second one was added in the x86_64 vdso patch.
arch/x86_64/kernel/vsyscall.c:
void update_vsyscall(struct timespec *wall_time, struct clocksource *clock)
{
unsigned long flags;
write_seqlock_irqsave(&vsyscall_gtod_data.lock, flags);
/* copy vsyscall data */
vsyscall_gtod_data.clock.vread = clock->vread;
vsyscall_gtod_data.clock.cycle_last = clock->cycle_last;
vsyscall_gtod_data.clock.mask = clock->mask;
vsyscall_gtod_data.clock.mult = clock->mult;
vsyscall_gtod_data.clock.shift = clock->shift;
vsyscall_gtod_data.wall_time_sec = wall_time->tv_sec;
vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec; <===
vsyscall_gtod_data.sys_tz = sys_tz;
vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec; <===
vsyscall_gtod_data.wall_to_monotonic = wall_to_monotonic;
write_sequnlock_irqrestore(&vsyscall_gtod_data.lock, flags);
}
-
Must have been a (harmless) merging mistake, but I bet gcc optimizes it out anyways. -Andi -
I did find this after some digging:
In the vdso code:
static inline long vgetns(void)
{
cycles_t (*vread)(void);
vread = gtod->clock.vread;
return ((vread() - gtod->clock.cycle_last) * gtod->clock.mult) >>
gtod->clock.shift;
}
Looks like an open-coded version of this in the kernel timekeeping code:
static inline s64 __get_nsec_offset(void)
{
cycle_t cycle_now, cycle_delta;
s64 ns_offset;
/* read clocksource: */
cycle_now = clocksource_read(clock);
/* calculate the delta since the last update_wall_time: */
cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
/* convert to nanoseconds: */
ns_offset = cyc2ns(clock, cycle_delta);
return ns_offset;
}
But the vdso version isn't doing any masking. And the mask is different for
different clocksources, so it has to track the underlying kernel's clocksource
when it gets changed.
-
vdso code needs to be all inlined because the vdso runs in ring 3 and cannot access other kernel code. It was opencoded it to have more control over the code (vdso requirements vdso effectively only supports TSC and HPET (the other clock sources are not accessible from ring 3) TSC doesn't need a mask, but many HPETs need a 32bit mask; good point. Does adding the mask to vgetns make the clock problems go away? -Andi -
Ahh.. that explains why acpi_pm clocksource doesn't trip over the problem....
