Re: [PATCH] Smackv10: Smack rules grammar + their stateful parser

Previous thread: PROBLEM: fakephp hangs terminal in 2.6.22 (and 2.6.21), does not Hot Plug by john_flowers on Friday, November 2, 2007 - 5:16 pm. (3 messages)

Next thread: [PATCH] eCryptfs: Release mutex on hash error path by Michael Halcrow on Friday, November 2, 2007 - 4:51 pm. (1 message)
To: <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Friday, November 2, 2007 - 4:50 pm

From: Casey Schaufler <casey@schaufler-ca.com>

Smack is the Simplified Mandatory Access Control Kernel.

Smack implements mandatory access control (MAC) using labels
attached to tasks and data containers, including files, SVIPC,
and other tasks. Smack is a kernel based scheme that requires
an absolute minimum of application support and a very small
amount of configuration data.

Smack uses extended attributes and
provides a set of general mount options, borrowing technics used
elsewhere. Smack uses netlabel for CIPSO labeling. Smack provides
a pseudo-filesystem smackfs that is used for manipulation of
system Smack attributes.

The patch, patches for ls and sshd, a README, a startup script,
and x86 binaries for ls and sshd are also available on

http://www.schaufler-ca.com

This version has been tested with 2.6.24-rc1. Development has been
done using Fedora Core 7 in a virtual machine environment and on an
old Sony laptop. There are pending changes required to the netlabel
api patch for 2.6.24-rc1-git11 (perhaps earlier).

Smack provides mandatory access controls based on the label attached
to a task and the label attached to the object it is attempting to
access. Smack labels are deliberately short (1-23 characters) text
strings. Single character labels using special characters are reserved
for system use. The only operation applied to Smack labels is equality
comparison. No wildcards or expressions, regular or otherwise, are
used. Smack labels are composed of printable characters and may not
include "/".

A file always gets the Smack label of the task that created it.

Smack defines and uses these labels:

"*" - pronounced "star"
"_" - pronounced "floor"
"^" - pronounced "hat"
"?" - pronounced "huh"

The access rules enforced by Smack are, in order:

1. Any access requested by a task labeled "*" is denied.
2. A read or execute access requested by a task labeled "^"
is permitted.
3. A read or execute access requested on an object label...

To: Casey Schaufler <casey@...>
Cc: <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Saturday, November 3, 2007 - 12:43 pm

Hi All,

After agreeing with Casey on the "load" input grammar yesterday, here's
the final grammar and its parser (which needs more testing):

A Smack Rule in an "egrep" format is:

"^[:space:]*Subject[:space:]+Object[:space:]+[rwxaRWXA-]+[:space:]*\n"

where Subject/Object strings are in the form:

"^[^/[:space:][:cntrl:]]{1,SMK_MAXLEN}$"

Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
---

Bashv3 builtin "echo" behaves very strangely to -EINVAL. It sends all
the buffers that causes -EINVAL again in subsequent echo invocations.

i.e.
echo "Invalid Rule" > /smack/load # -EINVAL returned
echo "Valid Rule" > /smack/load

In seconod iteration, echo sends the first invalid buffer again
then sends the new one. This causes a
"Invalid Rule\nValid Rule" buffer sent to write().

IMHO, this is a bug in builtin echo. The external /bin/echo doesn't
cause such strange behaviour.

diff --git a/security/smack/smack.h b/security/smack/smack.h
index e0b7d26..8dcdb79 100644
--- a/security/smack/smack.h
+++ b/security/smack/smack.h
@@ -219,4 +219,16 @@ static inline char *smk_of_inode(const struct inode *isp)
return sip->smk_inode;
}

+static inline int isblank(char c)
+{
+ return (c == ' ' || c == '\t');
+}
+
+#define swap(x, y, type) \
+do { \
+ type tmp = x; \
+ x = y; \
+ y = tmp; \
+} while (0); \
+
#endif /* _SECURITY_SMACK_H */
diff --git a/security/smack/smackfs.c b/security/smack/smackfs.c
index 3572ae5..0b1b530 100644
--- a/security/smack/smackfs.c
+++ b/security/smack/smackfs.c
@@ -25,6 +25,7 @@
#include <net/netlabel.h>
#include <net/cipso_ipv4.h>
#include <linux/seq_file.h>
+#include <linux/ctype.h>
#include "smack.h"

/*
@@ -67,6 +68,39 @@ struct smk_list_entry *smack_list;
#define SEQ_READ_FINISHED 1

/*
+ * Disable concurrent writing open() operations
+ */
+static struct semaphore smack_write_sem;
+
+/*
+ * States for parsing /smack/load ru...

To: Ahmed S. Darwish <darwish.07@...>
Cc: Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Tuesday, November 6, 2007 - 2:33 am

Can you limit this to 7bit ASCII and use isascii() somewhere?

Otherwise I'd expect funny things to happen when you e.g. use isspace()
on the UTF-8 encoded character à.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: Adrian Bunk <bunk@...>
Cc: Ahmed S. Darwish <darwish.07@...>, Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Tuesday, November 6, 2007 - 4:26 am

Actually, you don't need to. You tell them it expects UTF-8 encoded
strings and be done with it. All US-ASCII characters from 0 through
127 (IE: high bit clear) are exactly the same in UTF-8, and UTF-8
special characters have the high bit set in all bytes. Therefore you
just assume that anything with the high bit set is part of a word and
you can handle basic UTF-8. (It doesn't work on special UTF-8 space
characters like nonbreaking space and similar, but handling those is
significantly more complicated).

Cheers,
Kyle Moffett

-

To: Kyle Moffett <mrmacman_g4@...>
Cc: Ahmed S. Darwish <darwish.07@...>, Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Tuesday, November 6, 2007 - 4:56 am

The documentations says:
"Smack labels cannot contain unprintable characters or the "/" (slash)
character."

What you propose might contain unprintable characters, and it might even

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: Adrian Bunk <bunk@...>
Cc: Kyle Moffett <mrmacman_g4@...>, Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Tuesday, November 6, 2007 - 7:34 am

Hi,

As far as I understand the problem now, isspace() accepts the 0xa0
character which might collide with some of UTF-8 encoded characters
cause the high bit is set.

I used "if (!isspace(c) && !isgraph(c)) return -EINVAL;" to test
rules' characters validity which seems not enough. I'll add !isascii(c)
in the condition and ask Casey to change the documentation to be
something like:

Smack labels are represented in ASCII characters, they cannot contain
unprintable characters or the '/' (slash) character.

and in write():
if (!isascii(c) return -EINVAL;
if (!isspace(c) && !isgraph(c)) return -EINVAL;

This satisfy above customized labels rule, right ?

Regards,

--
Ahmed S. Darwish
Homepage: http://darwish.07.googlepages.com
Blog: http://darwish-07.blogspot.com
-

To: Ahmed S. Darwish <darwish.07@...>
Cc: Kyle Moffett <mrmacman_g4@...>, Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Tuesday, November 6, 2007 - 7:47 am

[Empty message]
To: Adrian Bunk <bunk@...>
Cc: Kyle Moffett <mrmacman_g4@...>, Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Tuesday, November 6, 2007 - 8:23 am

I admit I'm not experienced in such encoding stuff, but shouldn't the
ASCII and the ASCII-compatible UTF-8 encodings be enough for the

Won't this complicate the code too much ?

--
Ahmed S. Darwish
Homepage: http://darwish.07.googlepages.com
Blog: http://darwish-07.blogspot.com
-

To: Ahmed S. Darwish <darwish.07@...>
Cc: Adrian Bunk <bunk@...>, Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Tuesday, November 6, 2007 - 8:49 am

Well the VFS (for example) certainly doesn't support any encodings
other than various extended-ASCII forms (which includes UTF-8).
Something like UTF-16 has extra null characters in-between every
normal character, and as such would fail completely if passed to the
VFS.

Personally I think that isspace() accepting character 0xA0 is a bug,
as there are several variants of extended ASCII only one of which has
that character as a space. Others have it as á (accented A), etc.
In addition the "canonical" internal text format of the kernel is
UTF-8 as that encoding can represent any character in any other
encoding and it is backwards-compatible with traditional ASCII.

Cheers,
Kyle Moffett-

To: Kyle Moffett <mrmacman_g4@...>
Cc: Ahmed S. Darwish <darwish.07@...>, Adrian Bunk <bunk@...>, Casey Schaufler <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Tuesday, November 6, 2007 - 11:53 am

I think I agree with you. As far as the kernel is concerned, "isspace()"
should just accept the obvious spaces (hardspace, tab, newline), and
*perhaps* the VT/FF kind of things.

You should realize that the kernel <ctype.h> thing is *ancient*. It's
basically there from v0.01, and while the really original one (I just
checked) had all the non-ascii characters not trigger anything, it was
converted to be latin1 in the 2.1.x timeframe.

That's a *loong* time ago. Way before UTF-8 and other things were really
common.

So we should probably just make all the upper 128 bytes go back to "don't
trigger anything in ctype.h" - they'd not be spaces, but they'd not be
control characters or anything else either.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Ahmed S. Darwish <darwish.07@...>, Adrian Bunk <bunk@...>, Casey Schaufler <casey@...>, Andrew Morton <akpm@...>, <linux-security-module@...>, LKML Kernel <linux-kernel@...>
Date: Wednesday, November 7, 2007 - 6:56 am

Originally isspace() and other similar functions in ctype.h ignored
any character with the high bit set; however this was changed during
the linux 2.1 days to map Latin-1. As following Latin-1 will most
likely break UTF-8 any any *other* encoding that is backwards-
compatible with 7-bit-ASCII, change ctype.c to ignore such characters
completely (the way they were before). Linus seems to think this is
a good thing, and he's the one that wrote the code in the first place.

Signed-off-by: Kyle Moffett <mrlinuxman@mac.com>

---

To: Kyle Moffett <mrmacman_g4@...>
Cc: Ahmed S. Darwish <darwish.07@...>, Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Tuesday, November 6, 2007 - 9:34 am

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: Adrian Bunk <bunk@...>
Cc: Kyle Moffett <mrmacman_g4@...>, Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Tuesday, November 6, 2007 - 10:05 am

Great, To summarize the discussion. Will there be a problem in
accepting ASCII and the UTF-8 ASCII _subset_ _only_ and return
-EINVAL for all other cases/ecnodings ?.

i.e. The fragment I sent in a previous message:

/* Filter UTF-8 non-ascii compatible bytes (> 0x7F) */
if (!isascii(c)) return -EINVAL;
/* Filter unwanted ascii chars */

--
Ahmed S. Darwish
Homepage: http://darwish.07.googlepages.com
Blog: http://darwish-07.blogspot.com
-

To: Ahmed S. Darwish <darwish.07@...>
Cc: Kyle Moffett <mrmacman_g4@...>, Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Tuesday, November 6, 2007 - 10:10 am

As I already said, this should work fine.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: Adrian Bunk <bunk@...>
Cc: Kyle Moffett <mrmacman_g4@...>, Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Tuesday, November 6, 2007 - 10:30 am

I thought you were objecting not handling the remaining non-ASCII
UTF-8 bytes. Everything is clear now. A Big thank you to everybody in
this thread for your effort on this :).

Regards,

--
Ahmed S. Darwish
Homepage: http://darwish.07.googlepages.com
Blog: http://darwish-07.blogspot.com
-

To: Adrian Bunk <bunk@...>
Cc: Kyle Moffett <mrmacman_g4@...>, Ahmed S. Darwish <darwish.07@...>, Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Tuesday, November 6, 2007 - 7:02 am

Just like filenames then. I haven't noticed the vfs explode as a result

Alan
-

To: Casey Schaufler <casey@...>
Cc: <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>, Al Viro <viro@...>
Date: Sunday, November 4, 2007 - 8:56 pm

Same parser with adhering Casey's concers about previous one
and with using the FSM states in a more readable way.

I've double-checked the code for any possible off-by-one/overflow
errors. Could someone overcheck this for any possible hidden
security holes. Al, please :) ?

diff --git a/security/smack/smackfs.c b/security/smack/smackfs.c
index 3572ae5..c9461cb 100644
--- a/security/smack/smackfs.c
+++ b/security/smack/smackfs.c
@@ -25,6 +25,7 @@
#include <net/netlabel.h>
#include <net/cipso_ipv4.h>
#include <linux/seq_file.h>
+#include <linux/ctype.h>
#include "smack.h"

/*
@@ -67,6 +68,47 @@ struct smk_list_entry *smack_list;
#define SEQ_READ_FINISHED 1

/*
+ * Disable concurrent writing open() operations
+ */
+static struct semaphore smack_write_sem;
+
+/*
+ * States for parsing /smack/load rules
+ */
+enum load_state {
+ bol = 0, /* At Beginning Of Line */
+ subject = 1, /* At a "subject" token */
+ object = 2, /* At an "object" token */
+ access = 3, /* At an "access" token */
+ newline = 4, /* At end Of Line */
+ blank = 5, /* At a space or tab */
+};
+
+/*
+ * Represent current parsing state of /smack/load. Struct
+ * also stores data needed between an open-release session's
+ * multiple write() calls
+ */
+static struct smack_load_state {
+ enum load_state state; /* Current FSM parsing state */
+ enum load_state prevstate; /* Previous FSM state */
+ enum load_state prevtoken; /* Handle state = prevstate = blank */
+ struct smack_rule rule; /* Rule to be loaded */
+ int label_len; /* Subject/Object labels length so far */
+ char subject[SMK_LABELLEN]; /* Subject label */
+ char object[SMK_LABELLEN]; /* Object label */
+} *load_state;
+
+/*
+ * Rule's tokens separators are spaces and tabs only
+ */
+static inline int isblank(char c)
+{
+ return (c == ' ' || c == '\t');
+}
+
...

To: Ahmed S. Darwish <darwish.07@...>
Cc: Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>, Al Viro <viro@...>
Date: Sunday, November 11, 2007 - 8:44 am

Perhaps you should make it space, not 'space or tab', and only allow
lowercase permissions? That way, parser will be slightly simpler, and
you'll still have a chance to use 'R' as 'slightly different r'.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

To: Pavel Machek <pavel@...>
Cc: Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>, Al Viro <viro@...>
Date: Sunday, November 11, 2007 - 2:37 pm

Hi Pavel,

Thanks for your care about this. It seems not a lot of people have
noticed, but to stop any objections not related to the core smack
code, Casey decided to let the parsing be done in a user-space utility
that sends the rules to the kernel in a predefined strict format.

You can find how the whole story in the smackv11 announcement here:
http://article.gmane.org/gmane.linux.kernel.lsm/4463

Regards,

--
Ahmed S. Darwish
Homepage: http://darwish.07.googlepages.com
Blog: http://darwish-07.blogspot.com
-

To: Ahmed S. Darwish <darwish.07@...>
Cc: Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>, Al Viro <viro@...>
Date: Saturday, November 10, 2007 - 1:05 pm

...

Two things caught my eye.

Why is the '>' necessary? Could it happen that you had incremented past the
point of equality?

If that could not happen, then in my oppinion '>=' is very misleading when '=='
is really what is needed.

I wonder why it is valid to uncritically use the already incremented label_len
here, without checking its value (like is done above).

It seems strangely asymmetrical. I'm not saying it's wrong, because there may
be a subtle reason as to why it's not, but if that's the case then I think that
subtle reason should be documented with a comment.

Same applies here.

--

/ jakob

-

To: Jakob Oestergaard <jakob@...>
Cc: Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>, Al Viro <viro@...>
Date: Saturday, November 10, 2007 - 3:45 pm

Maximum value label_len could reach is "SMK_MAXLEN". The subjectstr
and objectstr arrays are of "SMK_MAXLEN + 1" length. So I think no
danger is here.

Good spots, thanks a lot for the review.

Regards,

--
Ahmed S. Darwish
Homepage: http://darwish.07.googlepages.com
Blog: http://darwish-07.blogspot.com
-

To: Casey Schaufler <casey@...>
Cc: <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Sunday, November 4, 2007 - 4:06 pm

Is it right to do the kzalloc with the semaphore being held? Will the
lock be held forever If kzalloc failed and -ENOMEM was returned ?

Thanks,

--
Ahmed S. Darwish
Homepage: http://darwish.07.googlepages.com
Blog: http://darwish-07.blogspot.com
-

To: Ahmed S. Darwish <darwish.07@...>
Cc: Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Sunday, November 4, 2007 - 8:28 am

This sounds like enough for 'NAK'.

Pavel,
who still thinks smack rules should be parsed
in userspace and compiled into selinux rules...

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

To: Pavel Machek <pavel@...>
Cc: Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>, Al Viro <viro@...>
Date: Monday, November 5, 2007 - 5:41 am

Ok, Could someone suggest a better idea please ?.

I thought about packing the rules in a structure and sending
-

To: Ahmed S. Darwish <darwish.07@...>
Cc: Pavel Machek <pavel@...>, Casey Schaufler <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, Al Viro <viro@...>
Date: Monday, November 5, 2007 - 12:21 pm

I personally think string parsers are *much* better than the alternatives

That's *MUCH* worse.

Strings are nice. They aren't that complex, and as long as it's not a
performance-critical area, there are basically no downsides.

Binary structures and ioctl's are *much* worse. They are totally
undebuggable with generic tools (think "echo" or "strace"), and they are a
total nightmare to parse across architectures and pointer sizes.

So the rule should be: always use strings if at all possible and relevant.
If the data is fundamentally binary, it shouldn't be re-coded to ascii (no
real advantage), but if the data is "stringish", and there aren't big
performance issues, then keep it as strings.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Ahmed S. Darwish <darwish.07@...>, Pavel Machek <pavel@...>, Casey Schaufler <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, Al Viro <viro@...>
Date: Tuesday, November 6, 2007 - 4:06 am

How do we get the information about the character encoding of the string

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: Adrian Bunk <bunk@...>
Cc: Ahmed S. Darwish <darwish.07@...>, Pavel Machek <pavel@...>, Casey Schaufler <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, Al Viro <viro@...>
Date: Tuesday, November 6, 2007 - 11:39 am

I really would expect that kernel strings don't have an encoding. They're
just C strings: a NUL-terminated stream of bytes.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Ahmed S. Darwish <darwish.07@...>, Pavel Machek <pavel@...>, Casey Schaufler <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, Al Viro <viro@...>
Date: Tuesday, November 6, 2007 - 7:00 pm

You were the one who suggested to _parse_ strings in the kernel.

Some basic parsing is possible due to the fact that all encodings you
usually get are extensions of ASCII with additional properties.

But in this thread we discussed that the following implemented things
in Smack and TOMOYO currently do not work correctly when the string
is encoded in UTF-8:
- isgraph()
- isspace()
- match one character

How do you suggest to implement them in a safe way without knowing about

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: Adrian Bunk <bunk@...>
Cc: Ahmed S. Darwish <darwish.07@...>, Pavel Machek <pavel@...>, Casey Schaufler <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, Al Viro <viro@...>
Date: Tuesday, November 6, 2007 - 7:08 pm

So?

We do that for lots of things.

What do you think a filename is? And yes, we parse it. Things like '/' and
'.' and '..' have magic meaning.

You don't need to bring up idiotic things like character sets. You can see
it as a byte string. You're done with it.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Ahmed S. Darwish <darwish.07@...>, Pavel Machek <pavel@...>, Casey Schaufler <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, Al Viro <viro@...>
Date: Tuesday, November 6, 2007 - 8:07 pm

We have the following properties in the character sets we handle:
- every ASCII character is encoded with the same byte as in ASCII
- if the eighth bit is 0, the byte can't be part of a multi-byte character
- no ASCII character can be encoded in a different way

This (plus most likely some other properties I've missed to mention)
allows some parsing based on ASCII characters.

But if you want to match "one character" (like TOMOYO does) or want to
check for printable characters except space (like Smack does) you must
know whether the byte string 0xC3 0xA0 is the character à or a sequence

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: Adrian Bunk <bunk@...>
Cc: Ahmed S. Darwish <darwish.07@...>, Pavel Machek <pavel@...>, Casey Schaufler <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, Al Viro <viro@...>
Date: Tuesday, November 6, 2007 - 8:27 pm

No you don't. You just check the binary value, and decide that ' ', '\t'
and '\n' are special.

Don't go get excited.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Ahmed S. Darwish <darwish.07@...>, Pavel Machek <pavel@...>, Casey Schaufler <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, Al Viro <viro@...>
Date: Tuesday, November 6, 2007 - 8:43 pm

How should TOMOYO implement it's "match one character" in a pattern

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: Adrian Bunk <bunk@...>
Cc: Ahmed S. Darwish <darwish.07@...>, Pavel Machek <pavel@...>, Casey Schaufler <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, Al Viro <viro@...>
Date: Tuesday, November 6, 2007 - 9:06 pm

.. I think such a design is fundamentally bogus. You don't have
"characters". You have "bytes".

So you either implement "match one byte", or you go crazy. It's that
simple.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Ahmed S. Darwish <darwish.07@...>, Pavel Machek <pavel@...>, Casey Schaufler <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, Al Viro <viro@...>
Date: Tuesday, November 6, 2007 - 9:59 pm

Sure, you can limit what is possible and what not.

But there are still many pitfalls, e.g. if someone would allow the
construct "[abc]" in patterns for matching one of these characters you'd
have to ensure that your syntax contains explicit character delimiters
or a pattern might match something completely different from what was
intended.

My opinion is that extended parsing of non-ASCII strings will cause too

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Ahmed S. Darwish <darwish.07@...>, Pavel Machek <pavel@...>, Casey Schaufler <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, Al Viro <viro@...>
Date: Wednesday, November 7, 2007 - 11:08 am

> Users are used to work on characters, not on bytes.

Absolutely true - but completely missing the point.

When I open a UTF-8 file name as displayed by nautilus the kernel does

If your regexps are reasonably complete you just have to turn the unicode
match into a byte match. That is a user interface problem - in user space.

Alan
-

To: Adrian Bunk <bunk@...>
Cc: Ahmed S. Darwish <darwish.07@...>, Pavel Machek <pavel@...>, Casey Schaufler <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, Al Viro <viro@...>
Date: Wednesday, November 7, 2007 - 12:09 am

Adrian, stop this idiocy. I'm not interested in listening to your
soliloqui about irrelevant stuff.

The kernel works on bytes. Deal with it. Stop whining.

You've been told several times that all the examples you showed were
irrelevant, and tomoyo worked on bytes too.

You have been told several times that the VFS layer works on bytes, and
has done so since day 1.

You have *also* been told that there is no real other option ("you can
work with bytes, or you can go mad"). The normal kernel interfaces have to
be locale-independent (parly because it doesn't even KNOW the locale,
partly because locale is just totally irrelevant).

And your statement above is a TOTAL AND UTTER LIE.

More people are used to work with bytes (the C language calls them "char")
than with what _you_ call "characters". The fact is, people are very very
very used to working with 8-bit bytes, and there are a lot more people who
understand them than people who understand UTF-8 (never mind any of the
other million possible stupid and insane locales).

So can you stop your inane whining now? You can either:

- accept that the kernel works on bytes (*) and that when we talk about
parsing strings, we're talking the very _traditional_ C meaning, which
is locale-independent, because locales DO NOT WORK in the kernel!

- or you can continue your irrelevant ranting that has nothing to do with
anything, but please don't cc me any more. People already pointed out
to you that your assumption that "character" means something else than
"byte" was wrong.

Please stop this. The absolute *last* thing you want is a kernel that
cares about locales. You *also* don't want a kernel that enforces some
idiotic UTF-8 rules, since not everybody is using UTF-8. That way lies
madness, not to mention totally unnecessary complexity.

Linus

(*) With some *very* rare special cases, notably in the console driver,
and for filesystems that are forced by idiot designers to be compatible
...

To: <bunk@...>
Cc: <darwish.07@...>, <pavel@...>, <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, <viro@...>, <torvalds@...>
Date: Tuesday, November 6, 2007 - 9:03 pm

Hello.

TOMOYO assumes that "one character" equals to "one byte"
since TOMOYO doesn't consider encoding.

According to your definition of "character" (i.e. it may be multibyte),
TOMOYO doesn't implement "match one character" pattern.
TOMOYO implements "match one byte" pattern.

Thanks.
-

To: Linus Torvalds <torvalds@...>
Cc: Pavel Machek <pavel@...>, Casey Schaufler <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, Al Viro <viro@...>
Date: Monday, November 5, 2007 - 7:38 pm

Thanks a lot for such a kind advice. I'll keep that in my mind.

Regards,

--
Ahmed S. Darwish
Homepage: http://darwish.07.googlepages.com
Blog: http://darwish-07.blogspot.com
-

To: <pavel@...>, <torvalds@...>
Cc: <darwish.07@...>, <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, <viro@...>
Date: Monday, November 5, 2007 - 5:56 pm

Not only pointer sizes bugs, but also checking pointer address costs.
For binary policy, we have to examine ->next pointer is valid or not.
We can't blindly use address supplied from userland.

I have encountered mismatch of kernel version and AppArmor's policy parser version
when I just updated only kernel. As a result, the segmentation faults rushed toward me.
From this experience, TOMOYO still uses string parser in the kernel.
If a parser doesn't consume much stack (i.e. call functions recursively), I think it is no problem.

Thanks.

-

To: Tetsuo Handa <penguin-kernel@...>
Cc: <pavel@...>, <torvalds@...>, <darwish.07@...>, <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, <viro@...>
Date: Tuesday, November 6, 2007 - 6:00 am

You have a "\?" pattern which is defined as "1 byte character other
than '/'".

The user usually doesn't know how many bytes a character in a path or
file name on his system has.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: <bunk@...>
Cc: <pavel@...>, <torvalds@...>, <darwish.07@...>, <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, <viro@...>
Date: Tuesday, November 6, 2007 - 8:27 am

Hello.

Don't worry. The "\?" pattern is for temporary files with /tmp/prefixXXXXXX pattern.
The "\*" pattern is for this purpose which means more than 0 byte characters other than '/'.

TOMOYO supports various patterns
http://tomoyo.sourceforge.jp/en/1.5.x/policy-reference.html#exception_po...

TOMOYO Linux handles string using 7bit ASCII. In TOMOYO Linux,
a byte 0x21 <= c <= 0x7E && c != 0x5C is represented as is,
c == 0x5C is represented as \\,
0x01 <= c <= 0x20 || 0x7F <= c <= 0xFF is represented as \ooo style.
c == 0x00 is not needed since it is used as end-of-string marker.
This rule makes any string passed from/to kernel safely.

Thanks.

-

To: Tetsuo Handa <penguin-kernel@...>
Cc: <pavel@...>, <torvalds@...>, <darwish.07@...>, <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, <viro@...>
Date: Tuesday, November 6, 2007 - 9:58 am

And there you document \* as well as \? as wildcards for "pathname
patterns".

And \* is not a replacement for \?. It's quite common to have both ways
to express "one character" and to express "at least one character", and

That's unrelated to this problem.

You talk about your internal byte representation.

But the problem is that in your code you only match one byte for \?,

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: <bunk@...>
Cc: <pavel@...>, <torvalds@...>, <darwish.07@...>, <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, <viro@...>
Date: Tuesday, November 6, 2007 - 10:32 am

Hello.

You can use \? to represent "one character" and
"one byte" is almost equal to "one character".
"\?" matches to one of the following types.

* 1 ASCII printable character (for 0x21-0x2E or 0x30-0x5B or 0x5D-0x7E)
* 2 ASCII printable characters \\ (for 0x5C, which means single "\")
* 4 ASCII printable characters \ooo (for 0x01-0x20 or 0x7F-0xFF, where "ooo" is octal value)

These 3 types represents one *byte*.
I want to say "\? matches to one character",
but since expression of a character depends on the value of that byte,
I'm saying "\? matches to one *byte* character" instead.
Well, this sentence might be confusing, but how can I express more accurately?

Thanks.

-

To: Tetsuo Handa <penguin-kernel@...>
Cc: <pavel@...>, <torvalds@...>, <darwish.07@...>, <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, <viro@...>
Date: Tuesday, November 6, 2007 - 10:59 am

The problem is that your code matches one byte, not one character.

More or less all userspace programs handle multi-byte UTF-8 characters
just fine without bothering the user with the fact whether a character
consists of one or more bytes.

And users will try to use this \? for matching one character when
writing a pattern that denies access.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: <bunk@...>
Cc: <pavel@...>, <torvalds@...>, <darwish.07@...>, <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, <viro@...>
Date: Tuesday, November 6, 2007 - 11:27 am

Hello.

I understood what you are saying.

You are saying "a character" does not always consist of one byte,
while I'm saying "a character" does always consist of one byte.

Yes, some userspace programs don't use strcmp()
since strcmp() can't handle some encodings like UTF-16.
But the kernel uses strcmp()
since the VFS related functions can't handle encodings
which contains '\0' in the pathname.
VFS related functions assume that '\0' is end-of-string marker.

So, from the point of view of userland programs,
'\?' should match to a single character (which depends on encoding).
But from the point of view of the kernel,
'\?' should match to a single byte (which doesn't depend on encoding).
Handling all possible encoding in the kernel is too difficult to implement.
Yes, but since this string is handled by the *kernel*,
I want users follow point of view of the kernel.

Thanks.

-

To: Tetsuo Handa <penguin-kernel@...>
Cc: <pavel@...>, <torvalds@...>, <darwish.07@...>, <casey@...>, <akpm@...>, <linux-security-module@...>, <linux-kernel@...>, <viro@...>
Date: Tuesday, November 6, 2007 - 6:42 pm

The common case isn't UTF-16, it's UTF-8.

Users are used to deal with characters and hot having to bother with all
the mess of different encodings.

Having patterns that describe rules to deny access in an LSM breaking

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

To: Pavel Machek <pavel@...>
Cc: Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Sunday, November 4, 2007 - 9:23 am

I've suggested that at first, but (hoping not to misquote Al) Al viro
said that the parsing is simple enough and no need exists for a

Would you please show the reason for the NAK so I can modify the code ?

--
Ahmed S. Darwish
Homepage: http://darwish.07.googlepages.com
Blog: http://darwish-07.blogspot.com
-

To: Ahmed S. Darwish <darwish.07@...>, Pavel Machek <pavel@...>
Cc: Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Sunday, November 4, 2007 - 12:37 pm

Woof. We've devolved into Al saying it must be done this way,
and Pavel saying it can't be done this way.

That's OK. Maybe it's time that I stepped back and came up
with a twisted mechanism for achieving my ends, as the obvious

Casey, who still thinks Pavel doesn't get it.

Casey Schaufler
casey@schaufler-ca.com
-

To: Ahmed S. Darwish <darwish.07@...>
Cc: Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Saturday, November 3, 2007 - 2:30 pm

Actually, what causes problems here is something between a bug and a
feature in libc's buffering. Basically the -EINVAL error causes libc
to leave its data in the file-output buffer despite the file being
closed and reopened. Since a standalone echo just exits that buffer
is discarded, but for the bash builtin it hangs around in the buffer
for a while and ends up getting prepended to the following echo
statement. There's actually multiple ways to make this fail; this is
just the simplest.

Cheers,
Kyle Moffett

-

To: Kyle Moffett <mrmacman_g4@...>
Cc: Casey Schaufler <casey@...>, <akpm@...>, <torvalds@...>, <linux-security-module@...>, <linux-kernel@...>
Date: Saturday, November 3, 2007 - 6:12 pm

Thanks a lot for such a useful info. Is there a way from my side to
make subsequent echo invocations not affected by previous failed ones
?

Regards,

--
Ahmed S. Darwish
Homepage: http://darwish.07.googlepages.com
Blog: http://darwish-07.blogspot.com
-

Previous thread: PROBLEM: fakephp hangs terminal in 2.6.22 (and 2.6.21), does not Hot Plug by john_flowers on Friday, November 2, 2007 - 5:16 pm. (3 messages)

Next thread: [PATCH] eCryptfs: Release mutex on hash error path by Michael Halcrow on Friday, November 2, 2007 - 4:51 pm. (1 message)