Re: [PATCH] capability: WARN when invalid capability is requested rather than BUG/panic

Previous thread: Re: [PATCH] x86: enable CPUID on Cyrix cpus with CPUID disabled by krzysztof.h1 on Tuesday, September 30, 2008 - 6:06 am. (1 message)

Next thread: PROBLEM: processes claim CPU indefinitely on Intel(R) Core(TM)2 Quad by Reto Glauser on Tuesday, September 30, 2008 - 7:19 am. (1 message)
From: Eric Paris
Date: Tuesday, September 30, 2008 - 6:55 am

The firegl ATI/AMD closed source proprietary driver has some 2,733
panics registered on kerneloops.org and recently SELinux was blamed for
BUGing because of this driver.

http://www.kerneloops.org/search.php?search=firegl_ioctl&btnG=Function+Search

Basically it looks like the firegl_ioctl() function is calling capable()
with a huge value.  cap_capable dereferences an array (64 bits long) WAY
off the end.  I know a specific example would be referencing the array
c.cap[0x7B4BB01].  As dumb luck would have it occasionally this doesn't
pagefault / panic and so we get to selinux code which has a clean check
for values which are obviously too large and BUG().

This patch adds a WARN_ONCE() to cap_capable() so we will stop
dereferencing random spots of memory and will cleanly tell the obviously
broken driver that it doesn't have that ridiculous permissions.  No idea
if the driver is going to handle EPERM but anything that calls capable
and doesn't expect a denial has got to be the worst piece of code ever
written.....  I could return EINVAL, but I think its clear that noone
has capabilities over 64 so clearly they don't have that permission.

This 'could' be considered a regression since 2.6.24.  Neither SELinux
nor the capabilities system had a problem with ginormous request values
until we got 64 bit support, although this is OBVIOUSLY a bug with the
out of tree closed source driver....

Oh yeah, and if anyone knows people at ATI/AMD tell them to fix their
broken driver....

Signed-off-by: Eric Paris <eparis@redhat.com>

---

 security/commoncap.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/security/commoncap.c b/security/commoncap.c
index e4c4b3f..92715e7 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -50,6 +50,11 @@ EXPORT_SYMBOL(cap_netlink_recv);
  */
 int cap_capable (struct task_struct *tsk, int cap)
 {
+	if (unlikely(!cap_valid(cap))) {
+		WARN_ONCE(1, "Invalid capability:%d\n", cap);
+		return -EPERM;
+	}
+
 ...
From: James Morris
Date: Tuesday, September 30, 2008 - 7:23 am

An issue here is whether we should be adding workarounds in the mainline 
kernel for buggy closed drivers.  Papering over problems rather than 
getting them fixed does not seem like a winning approach.  Especially 
problems which are unexpectedly messing with kernel security APIs.

Also, won't this encourage vendors of such drivers to continue with this 
behavior, while discouraging those vendors who are doing the right thing?

Do we know if this even really helps the user?  For all we know, the 
driver may simply crash differently with an -EPERM.



- James
-- 
James Morris
<jmorris@namei.org>
--

From: Eric Paris
Date: Tuesday, September 30, 2008 - 7:36 am

I don't know, looking at the feelings on "Can userspace bugs be kernel
regressions" leads me to believe that when we break something that once
worked we are supposed to fix it.

http://lwn.net/Articles/292143/

I don't think the proprietary closed source nature of the driver makes
it any less our problem to not make changes which cause the kernel to

Discouraging people who open source their drivers and put them in the
kernel?  obviously not.  encouraging crap?  well, I hope we fix

Well, before the 64 bit capabilities change we did:

(cap_t(c) & CAP_TO_MASK(flag))

so a huge value for "flag" got masked off.

After 64 bit capabilities we do:

((c).cap[CAP_TO_INDEX(flag)] & CAP_TO_MASK(flag))

so a huge flag causes an array index out of bounds and either explodes
here or continues onto SELinux where it BUG().

So this is regression.  It would have gotten an EPERM, now it gets a
BUG/panic.

Yes ATI needs to fix their driver, but we broke it and I don't remember
the driver not working on 2.6.24 and earlier....

-Eric

--

From: Serge E. Hallyn
Date: Tuesday, September 30, 2008 - 8:38 am

Perhaps we should have CAP_TO_INDEX mask itself?

#define CAP_TO_INDEX(x)		(((x) >> 5) & _KERNEL_CAPABILITY_U32S)

Though I still think it's not unreasonable to simply ask for the driver
--

From: Eric Paris
Date: Tuesday, September 30, 2008 - 9:07 am

Well, you save a branch and won't get the pagefault so it does 'fix' the
pagefault/panic from cap code.  It doesn't tell us when others screw up
and SELinux is still possibly going to BUG().  We are also going to
actually be returning a permission decision not on what was requested
but on something wholely different.

I like mine better, but I'm ok with yours and can just do my changes in
SELinux if this is how cap wants to handle it.  I don't really like the
idea of mutating the inputs and then making the security decision based
on that mutation rather than on the original inputs (and yes, I realize

I'm not going to argue that the driver needs fixed and that is the real
problem.  I know its been filed with them and the response was that
there is no support for linux.  I have today tried to poke the path I
know of between Red Hat and them to ask them to take a look.

--

From: Serge E. Hallyn
Date: Tuesday, September 30, 2008 - 9:28 am

Heh I don't like either one, just thought this would reduce the overhead
a bit :)

--

From: Eric Paris
Date: Tuesday, September 30, 2008 - 10:22 am

No argument from me that patching up for buggy drivers sucks.  Yours
would be less overhead, and it would return the cap system back to
pre-2.6.25 operation (garbage in garbage out but no panic).  Since we
already have the branch in SELinux its no 'extra' overhead to EPERM
there instead of here (garbage in EPERM out).

-Eric

--

From: Arjan van de Ven
Date: Tuesday, September 30, 2008 - 10:28 am

On Tue, 30 Sep 2008 13:22:30 -0400

to be honest, this is really a case of 
panic("This stuff is really borken")

if it passes some random value, what other api's does it pass a random
value to ?

(and in addition, random values to security critical APIs deserve a
process kill, because it could well be an exploit attempt at guessing
something. At least by not letting it live it's harder to get such type
of exploits to be able to guess things. So imo, BUG() is the right
answer)




-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Eric Paris
Date: Wednesday, October 1, 2008 - 8:32 am

Do we have any concern of a module being compiled against a new kernel
say with cap number 35 defined and then loaded into a kernel with only
34 capabilities?  Do we care about that forward compatibility?  If we
care BUG is scary.  EPERM would be the right thing since clearly on this
kernel the process can't possibly have cap #35.

We really have 4 options (in the order I like them).

1) do nothing (garbage in garbage out, sometimes panic sometimes not)
2) mask CAP_TO_INDEX (garbage in garbage out, no panic)
3) BUG_ON(!cap_valid(flag)) (garbage in BUG out, no panic)
4) WARN_ON/EPERM (garbage in EPERM out, no panic)

SELinux already sorta does #3 and #4 (we will panic if cap > 64 and will
EPERM between the max cap and 64) but I really don't like being blamed
when it's not my fault.  SELinux takes enough crap when people's systems
don't work and this time its clearly not my fault, which is why I'm
pushing this.

If we believe the capability system should take path's 1, 2, or 4 I'm
going to take path 4 in SELinux.  If capabilities wants to take path 3,
I'm ok with that too.  Its going to break a lot of people's machines I'm
afraid, but it would force ATI to fix their crap....

-Eric

--

From: Arjan van de Ven
Date: Wednesday, October 1, 2008 - 8:39 am

On Wed, 01 Oct 2008 11:32:40 -0400

No!
If you don't compile the module against the right kernel you get what

really; if you get garbage into the security system, BUG/panic is the
only way to go. You *know* there is an issue around security somehow,
(be it the "give me root" ioctl in fireglx or something else), and
to continue just keeps your machine exposed. That really is no option.

As to current users of said broken module: they crash-and-burn today
already, but that's between them and their module vendor if they chose
to run some binary clunker.



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Serge E. Hallyn
Date: Wednesday, October 1, 2008 - 8:44 am

Assuming you have a kernel with your patch for 4, could you just run
some perf tests vs the unpatched kernel to show there's really no
meaningful performance impact?

-serge
--

From: Andrew G. Morgan
Date: Saturday, October 4, 2008 - 6:30 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


[Sorry to get to this late.]

I'd prefer:

3) BUG_ON(!cap_valid(flag)) (garbage in BUG out, no panic)

It seems absurd to be guessing what a buggy driver might be wanting -


if distributions want to silently work around it (with some sort of
variant of 4 or perhaps 2) until the vendor of this driver has time to
fix it, then that doesn't seem unreasonable. But it seems to me that the
upstream kernel shouldn't institutionalize this sort of nonsense.

Cheers

Andrew
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI6Bi9+bHCR3gb8jsRArIBAKCuODkju29dCosHWPiXZVKoNP0FSwCaAzPx
EWihO1+F0qd0BFZarm/CFKM=
=HJik
-----END PGP SIGNATURE-----
--

Previous thread: Re: [PATCH] x86: enable CPUID on Cyrix cpus with CPUID disabled by krzysztof.h1 on Tuesday, September 30, 2008 - 6:06 am. (1 message)

Next thread: PROBLEM: processes claim CPU indefinitely on Intel(R) Core(TM)2 Quad by Reto Glauser on Tuesday, September 30, 2008 - 7:19 am. (1 message)