Booting 2.6.25-rc3 on my Ultra5 causes a hang before or as the console is switched over to the framebuffer. The console output is (extrapolated from dmesg in -rc2 and handwritten notes, as I don't have a serial cable to my U5): PROMLIB: Sun IEEE Boot Prom 'OBP 3.25.3 2000/06/29 14:12' PROMLIB: Root node compatible: *** the following line can't be seen in dmesg after rc2 has booted console [earlyprom0] enabled Linux version 2.6.25-rc3 (mikpe@sparge) (gcc version 4.2.3) #1 Mon Feb 25 18:49:41 CET 2008 ARCH: SUN4U Ethernet address: 08:00:20:fd:ec:1f [0000000200000000-fffff80000400000] page_structs=262144 node=0 entry=0/0 [0000000200000000-fffff80000800000] page_structs=262144 node=0 entry=1/0 [0000000200000000-fffff80000c00000] page_structs=262144 node=0 entry=2/0 [0000000200000000-fffff80001000000] page_structs=262144 node=0 entry=3/0 OF stdout device is: /pci@1f,0/pci@1,1/SUNW,m64B@2 PROM: Built device tree with 46617 bytes of memory. On node 0 totalpages: 32299 Normal zone: 335 pages used for memmap Normal zone: 0 pages reserved Normal zone: 31964 pages, LIFO batch:7 Movable zone: 0 pages used for memmap Built 1 zonelists in Zone order, mobility grouping on. Total pages: 31964 Kernel command line: ro root=/dev/sda5 PID hash table entries: 1024 (order: 10, 8192 bytes) clocksource: mult[28000] shift[16] clockevent: mult[66666666] shift[32] Console: colour dummy device 80x25 *** the following line can't be seen in dmesg after rc2 has booted console handover: boot [earlyprom0] -> real [tty0] At this point rc3 hangs hard and won't even respond to sysrq. Another difference is that with rc2 the first few lines of kernel output while the console is still in OF mode either aren't shown or disappear quickly since the switch to the framebuffer occurs within a fraction of a second after the kernel has been loaded. With rc3 the kernel output (the text shown above) in the OF-mode console is very very slow. (I should have quoted my .config here but I forgot to bring it. ...
From: Mikael Pettersson <mikpe@it.uu.se> Yes that's a new feature. Until we switch over to the "real" console we print the log messages using the firmware console routines. This way if an early crash or similar happens, you'll see it and be able to report it instead of having to report with "-p" on the command line. I'll fire up my ultra5 and try to figure out what's wrong with the atyfb framebuffer driver, that's where it's dying. --
Mikael Pettersson writes: > Booting 2.6.25-rc3 on my Ultra5 causes a hang before or as > the console is switched over to the framebuffer. The console > output is (extrapolated from dmesg in -rc2 and handwritten > notes, as I don't have a serial cable to my U5): > > PROMLIB: Sun IEEE Boot Prom 'OBP 3.25.3 2000/06/29 14:12' > PROMLIB: Root node compatible: > *** the following line can't be seen in dmesg after rc2 has booted > console [earlyprom0] enabled > Linux version 2.6.25-rc3 (mikpe@sparge) (gcc version 4.2.3) #1 Mon Feb 25 18:49:41 CET 2008 > ARCH: SUN4U > Ethernet address: 08:00:20:fd:ec:1f > [0000000200000000-fffff80000400000] page_structs=262144 node=0 entry=0/0 > [0000000200000000-fffff80000800000] page_structs=262144 node=0 entry=1/0 > [0000000200000000-fffff80000c00000] page_structs=262144 node=0 entry=2/0 > [0000000200000000-fffff80001000000] page_structs=262144 node=0 entry=3/0 > OF stdout device is: /pci@1f,0/pci@1,1/SUNW,m64B@2 > PROM: Built device tree with 46617 bytes of memory. > On node 0 totalpages: 32299 > Normal zone: 335 pages used for memmap > Normal zone: 0 pages reserved > Normal zone: 31964 pages, LIFO batch:7 > Movable zone: 0 pages used for memmap > Built 1 zonelists in Zone order, mobility grouping on. Total pages: 31964 > Kernel command line: ro root=/dev/sda5 > PID hash table entries: 1024 (order: 10, 8192 bytes) > clocksource: mult[28000] shift[16] > clockevent: mult[66666666] shift[32] > Console: colour dummy device 80x25 > *** the following line can't be seen in dmesg after rc2 has booted > console handover: boot [earlyprom0] -> real [tty0] > > At this point rc3 hangs hard and won't even respond to sysrq. > > Another difference is that with rc2 the first few lines of kernel > output while the console is still in OF mode either aren't shown > or disappear quickly since the switch to ...
From: Mikael Pettersson <mikpe@it.uu.se>
Between the VT layer registering it's console and the atyfb
driver initializing we get a crash, and it happens on all
sparc64 systems. It is caused by this commit and I am working
on a fix:
commit a0c1e9073ef7428a14309cba010633a6cd6719ea
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Sat Feb 23 15:23:57 2008 -0800
futex: runtime enable pi and robust functionality
Not all architectures implement futex_atomic_cmpxchg_inatomic(). The default
implementation returns -ENOSYS, which is currently not handled inside of the
futex guts.
Futex PI calls and robust list exits with a held futex result in an endless
loop in the futex code on architectures which have no support.
Fixing up every place where futex_atomic_cmpxchg_inatomic() is called would
add a fair amount of extra if/else constructs to the already complex code. It
is also not possible to disable the robust feature before user space tries to
register robust lists.
Compile time disabling is not a good idea either, as there are already
architectures with runtime detection of futex_atomic_cmpxchg_inatomic support.
Detect the functionality at runtime instead by calling
cmpxchg_futex_value_locked() with a NULL pointer from the futex initialization
code. This is guaranteed to fail, but the call of
futex_atomic_cmpxchg_inatomic() happens with pagefaults disabled.
On architectures, which use the asm-generic implementation or have a runtime
CPU feature detection, a -ENOSYS return value disables the PI/robust features.
On architectures with a working implementation the call returns -EFAULT and
the PI/robust features are enabled.
The relevant syscalls return -ENOSYS and the robust list exit code is blocked,
when the detection fails.
Fixes http://lkml.org/lkml/2008/2/11/149
Originally reported by: Lennart Buytenhek
Signe...David Miller writes: > From: Mikael Pettersson <mikpe@it.uu.se> > Date: Tue, 26 Feb 2008 09:55:50 +0100 > > > Minor update: rc2-git7 has the slow initial console behaviour, > > but successfully switches to the framebuffer. rc2-git8 however > > hangs in the console handover. So I'll bisect git7->git8 next. > > Between the VT layer registering it's console and the atyfb > driver initializing we get a crash, and it happens on all > sparc64 systems. It is caused by this commit and I am working > on a fix: > > commit a0c1e9073ef7428a14309cba010633a6cd6719ea > Author: Thomas Gleixner <tglx@linutronix.de> > Date: Sat Feb 23 15:23:57 2008 -0800 > > futex: runtime enable pi and robust functionality My git7->git8 bisection yesterday independently also arrived at that specific commit as being the culprit. Bracketing the offending cmpxchg_futex_value_locked(NULL, 0, 0) call with #if 0 .. #endif was enough to make my kernel boot. I'll try your do_kernel_fault() patch later today. /Mikael --
From: David Miller <davem@davemloft.net>
Date: Tue, 26 Feb 2008 16:49:00 -0800 (PST)
[ Thomas, forgot to CC: you earlier, changeset
a0c1e9073ef7428a14309cba010633a6cd6719ea ("futex: runtime enable pi
The following patch will let things "work" but the trick being used
here by the FUTEX layer is borderline valid in my opinion.
Basically for 10+ years on sparc64 we've had this check here in the
fault path, which makes sure that if we're processing an exception
table entry we really, truly, are doing an access to userspace from
the kernel. Otherwise we OOPS.
What the FUTEX checking code is doing now is doing a "user" access
with set_fs(KERNEL_DS) since it runs from the kernel bootup early init
sequence. And this is illegal according to the existing checks.
When we do set_fs(KERNEL_DS) then pass a "user" pointer down
into a system call or something like that, we give it a pointer
that "cannot fault". So if we get into the fault handling
path here for a case like that we really do want to scream and
print out an OOPS message in my opinion.
I realize that not many platforms other than sparc64 can check
for things this precisely, but it's something to consider.
Did this FUTEX change go into -stable too?
diff --git a/arch/sparc64/mm/fault.c b/arch/sparc64/mm/fault.c
index e2027f2..9183633 100644
--- a/arch/sparc64/mm/fault.c
+++ b/arch/sparc64/mm/fault.c
@@ -244,16 +244,8 @@ static void do_kernel_fault(struct pt_regs *regs, int si_code, int fault_code,
if (regs->tstate & TSTATE_PRIV) {
const struct exception_table_entry *entry;
- if (asi == ASI_P && (insn & 0xc0800000) == 0xc0800000) {
- if (insn & 0x2000)
- asi = (regs->tstate >> 24);
- else
- asi = (insn >> 5);
- }
-
- /* Look in asi.h: All _S asis have LS bit set */
- if ((asi & 0x1) &&
- (entry = search_exception_tables(regs->tpc))) {
+ entry = search_exception_tables(regs->tpc);
+ if (entry) {
regs->tpc = entr...David Miller writes:
> From: David Miller <davem@davemloft.net>
> Date: Tue, 26 Feb 2008 16:49:00 -0800 (PST)
>
> [ Thomas, forgot to CC: you earlier, changeset
> a0c1e9073ef7428a14309cba010633a6cd6719ea ("futex: runtime enable pi
> and robust functionality") broke sparc64. ]
>
> > From: Mikael Pettersson <mikpe@it.uu.se>
> > Date: Tue, 26 Feb 2008 09:55:50 +0100
> >
> > > Minor update: rc2-git7 has the slow initial console behaviour,
> > > but successfully switches to the framebuffer. rc2-git8 however
> > > hangs in the console handover. So I'll bisect git7->git8 next.
> >
> > Between the VT layer registering it's console and the atyfb
> > driver initializing we get a crash, and it happens on all
> > sparc64 systems. It is caused by this commit and I am working
> > on a fix:
>
> The following patch will let things "work" but the trick being used
> here by the FUTEX layer is borderline valid in my opinion.
>
> Basically for 10+ years on sparc64 we've had this check here in the
> fault path, which makes sure that if we're processing an exception
> table entry we really, truly, are doing an access to userspace from
> the kernel. Otherwise we OOPS.
>
> What the FUTEX checking code is doing now is doing a "user" access
> with set_fs(KERNEL_DS) since it runs from the kernel bootup early init
> sequence. And this is illegal according to the existing checks.
>
> When we do set_fs(KERNEL_DS) then pass a "user" pointer down
> into a system call or something like that, we give it a pointer
> that "cannot fault". So if we get into the fault handling
> path here for a case like that we really do want to scream and
> print out an OOPS message in my opinion.
>
> I realize that not many platforms other than sparc64 can check
> for things this precisely, but it's s...From: Mikael Pettersson <mikpe@it.uu.se> Thank you for testing. --
So it would be correct to set_fs(USER_DS) then do the check and switch It's queued, AFAIK Thanks, tglx --
From: Thomas Gleixner <tglx@linutronix.de> No, I'm saying it would be better not to take faults purposefully in the kernel address space. We don't have a usable user address space setup at this point in the boot, so using USER_DS would be even worse. I think I'll just add a different version of the sanity check to this sparc64 code later on, one that will take into consideration this KERNEL_DS case because I can see how it could be useful in other Crap, I'll need to push my fix there too. --
I would have preferred not to. The hassle is that we need to figure out, whether it works or not _before_ any user space program can use the interfaces. We could omit the check for archs where the Ok. Thanks, tglx --
From: Mikael Pettersson <mikpe@it.uu.se> Thanks for doing this research. --
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
| James Bottomley | Re: Integration of SCST in the mainstream Linux kernel |
| Jeff Garzik | Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in |
| Chodorenko Michail | PROBLEM: Celeron Core |
git: | |
| Linus Torvalds | People unaware of the importance of "git gc"? |
| Johannes Schindelin | Re: Empty directories... |
| Jakub Narebski | Re: VCS comparison table |
| Sam Song | Re: Fwd: [OT] Re: Git via a proxy server? |
| J.W. Zondag | Dell PE1950 III - Perc 6i |
| Richard Stallman | Real men don't attack straw men |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Anselm R. Garbe | OpenBSD 4.0 / Xorg -> vesa 1920x1200 widescreen resolution |
| Jim Winstead Jr. | Re: Root Disk/Book Disk Compatibility |
| Anselm Lingnau | File creation date in UNIX (was: Re: VMS) |
| Rafal Kustra (summer student) | mount |
| Nicholas Yue | Re: more on 486/33 weirdness |
