Re: kerneloops.org: 2.6.26-rc possible regression in ext3

Previous thread: kerneloops.org: 2.6.26-rc possible regression in ext3 by Arjan van de Ven on Thursday, June 19, 2008 - 1:34 am. (7 messages)

Next thread: [PATCH 0/3 2.6.27] cxgb3i: Add iSCSI driver by Karen Xie on Thursday, June 19, 2008 - 1:10 am. (2 messages)
To: Linux Kernel Mailing List <linux-kernel@...>
Cc: Linus Torvalds <torvalds@...>, <linux-ext4@...>, Andrew Morton <akpm@...>
Date: Thursday, June 19, 2008 - 1:36 am

In the kerneloops.org stats, a new oops is rapidly climbing the charts.
The oops is a page fault in the ext3 "do_slit" function, and the first
report of it was with 2.6.26-rc6-git3.

It happens with various applications; the backtraces are at:

http://www.kerneloops.org/search.php?search=do_split

but are generally of this pattern:

*do_split
ext3_add_entry
ext3_rename
vfs_rename
... <various paths into vfs_rename> ...

or

*do_split
? add_dirent_to_buf
ext3_add_entry
ext3_new_inode
ext3_add_nondir
ext3_create
vfs_create
....

did we change anything in ext

--

To: Arjan van de Ven <arjan@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, <linux-ext4@...>, Andrew Morton <akpm@...>
Date: Thursday, June 19, 2008 - 10:00 am

Arjan, I was just looking at kerneloops last night, seeing the count for
this oops climb, and was wishing there were some way to annotate an oops
signature with more info. If I could have tagged this with the RH
bugzilla nr. it might have saved a lot of time for folks. Is this
feasible? Or is finding the oops text in bugzilla the only way?

Thanks,

-Eric

--

To: Eric Sandeen <sandeen@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, <linux-ext4@...>, Andrew Morton <akpm@...>
Date: Thursday, June 19, 2008 - 10:07 am

there's a way to add a description to oopses (you might have seen some of these
descriptions already); however I've not implemented an account system yet so for
now it's only me who can add these.
--

To: Arjan van de Ven <arjan@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, <linux-ext4@...>, Andrew Morton <akpm@...>
Date: Thursday, June 19, 2008 - 10:17 am

Ok, that was my guess. I'll shoot you an email next time. :)

Thanks,
-Eric
--

To: Arjan van de Ven <arjan@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, <linux-ext4@...>, Andrew Morton <akpm@...>
Date: Thursday, June 19, 2008 - 1:42 am

This is a bug in rawhide in gcc miscompiling something...

https://bugzilla.redhat.com/show_bug.cgi?id=451068

Dave.
--

To: Dave Airlie <airlied@...>
Cc: Arjan van de Ven <arjan@...>, Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, <linux-ext4@...>, Andrew Morton <akpm@...>
Date: Thursday, June 19, 2008 - 4:11 am

If I understand it correctly that's a bug in upstream gcc 4.3.1
(but not in gcc 4.3.0)?

Expect a lot more of this to pop up in the future.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Adrian Bunk <bunk@...>
Cc: Dave Airlie <airlied@...>, Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, <linux-ext4@...>, Andrew Morton <akpm@...>
Date: Thursday, June 19, 2008 - 9:40 am

it/s better to find if the gcc guys made a testcase for this bug (they normally do) and
test based on that.
--

To: Arjan van de Ven <arjan@...>
Cc: Dave Airlie <airlied@...>, Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, <linux-ext4@...>, Andrew Morton <akpm@...>
Date: Thursday, June 19, 2008 - 11:10 am

The gcc Bugzilla contains a testcase.

But how do you plan to integrate it into a kernel build?

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Adrian Bunk <bunk@...>
Cc: Dave Airlie <airlied@...>, Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, <linux-ext4@...>, Andrew Morton <akpm@...>
Date: Thursday, June 19, 2008 - 11:18 am

we already have several of these.
Just look at scripts/gcc-x86_64-has-stack-protector.sh for an example of such a beast.
--

To: Arjan van de Ven <arjan@...>
Cc: Dave Airlie <airlied@...>, Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, <linux-ext4@...>, Andrew Morton <akpm@...>
Date: Thursday, June 19, 2008 - 11:25 am

Checking whether gcc supports some flags is easy.

But miscompilations are a different issue.

Especially since we also want to reject broken gcc versions for cross
compilations.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Adrian Bunk <bunk@...>
Cc: Dave Airlie <airlied@...>, Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, <linux-ext4@...>, Andrew Morton <akpm@...>
Date: Thursday, June 19, 2008 - 11:27 am

have you actually looked at this script?
You didn't, since the script doesn't check if gcc supports some flag.
It checks very specifically for a code generation pattern...

Please go look at the script first before responding.

--

To: Arjan van de Ven <arjan@...>
Cc: Dave Airlie <airlied@...>, Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, <linux-ext4@...>, Andrew Morton <akpm@...>
Date: Thursday, June 19, 2008 - 11:43 am

I did look, but I missed the last pipe...

Do we know for sure this bug can only trigger on 32bit x86?

Or is there anything else I miss in gcc-x86_64-has-stack-protector.sh
that allows to use this approach to check for wrong code generation
caused by platform independent gcc bugs?

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Adrian Bunk <bunk@...>
Cc: Dave Airlie <airlied@...>, Arjan van de Ven <arjan@...>, Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, <linux-ext4@...>, Andrew Morton <akpm@...>
Date: Thursday, June 19, 2008 - 4:32 am

Adrian Bunk writes:
> On Thu, Jun 19, 2008 at 03:42:34PM +1000, Dave Airlie wrote:
> > On Thu, Jun 19, 2008 at 3:36 PM, Arjan van de Ven <arjan@linux.intel.com> wrote:
> > > In the kerneloops.org stats, a new oops is rapidly climbing the charts.
> > > The oops is a page fault in the ext3 "do_slit" function, and the first
> > > report of it was with 2.6.26-rc6-git3.
> > >
> > > It happens with various applications; the backtraces are at:
> > >
> > > http://www.kerneloops.org/search.php?search=do_split
> >
> > This is a bug in rawhide in gcc miscompiling something...
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=451068
>
> If I understand it correctly that's a bug in upstream gcc 4.3.1
> (but not in gcc 4.3.0)?
>
> Expect a lot more of this to pop up in the future.
> Should we #error for gcc 4.3.1?

There are other nasty bugs in gcc-4.3.0. I actually
had to completely ban 4.3.0 in a user-space project
I'm involved with (Erlang) due to gcc PR36339 (fixed
in 4.3.1).

What's the gcc bugzilla number for this new 4.3.1 bug?
--

To: Mikael Pettersson <mikpe@...>
Cc: Dave Airlie <airlied@...>, Arjan van de Ven <arjan@...>, Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, <linux-ext4@...>, Andrew Morton <akpm@...>
Date: Thursday, June 19, 2008 - 6:49 am

#36533

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Dave Airlie <airlied@...>
Cc: Arjan van de Ven <arjan@...>, Linux Kernel Mailing List <linux-kernel@...>, <linux-ext4@...>, Andrew Morton <akpm@...>
Date: Thursday, June 19, 2008 - 2:42 am

Gaah. I should read all my email instead of wasting my time trying to
match up the code with what I can reproduce..

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Dave Airlie <airlied@...>, Linux Kernel Mailing List <linux-kernel@...>, <linux-ext4@...>, Andrew Morton <akpm@...>
Date: Thursday, June 19, 2008 - 3:09 am

unfortunately, kerneloops.org didn't pick up the link to this bug (due to the fact
that the oops in the bug was a jpeg....)... maybe one day if I'm really bored
I'll implement OCR into it ;)

sorry about wasting your time

--

To: Dave Airlie <airlied@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, Linus Torvalds <torvalds@...>, <linux-ext4@...>, Andrew Morton <akpm@...>
Date: Thursday, June 19, 2008 - 1:48 am

thanks for letting us know so fast!
I've marked this one in the database as a fedora gcc bug
--

Previous thread: kerneloops.org: 2.6.26-rc possible regression in ext3 by Arjan van de Ven on Thursday, June 19, 2008 - 1:34 am. (7 messages)

Next thread: [PATCH 0/3 2.6.27] cxgb3i: Add iSCSI driver by Karen Xie on Thursday, June 19, 2008 - 1:10 am. (2 messages)