Re: [RFC] 0/11 fanotify: fscking all notifiction and file access system (intended for antivirus scanning and file indexers)

Previous thread: num_possible_cpus() giving more than possible. by Steven Rostedt on Friday, September 26, 2008 - 1:22 pm. (2 messages)

Next thread: [RFC 2/11] fsnotify: pass a file instead of an inode to open modify and read by Eric Paris on Friday, September 26, 2008 - 2:18 pm. (2 messages)

The following is a file notification and access system intended to allow
a variety of userspace programs to get information about filesystem
events no matter where or how they happen on a system and use that in
conjunction with the actual on disk data related to that event to
provide additional services such as file change indexing or content
based antivirus scanning.  Minor changes are almost certainly possible
to make this notification and access interface usable for HSMs.  fscking
all notify is generally refered to as fanotify.  The ideas behind this
code are based on talpa the GPL antivirus interface originally pioneered
by Sophos and on the feedback from lkml and malware-list.  This is
however a complete rewrite from scratch, so if you remember talpa 'this
ain't it.'

the most up2date (but not always working) patch set can always be found
at http://people.redhat.com/~eparis/fanotify

comments, attacks, criticism, bad names, and really just about anything
can be sent to me but please lets not rehash useless conversations!  I
will send the full patch set to both lists, but I'm not going to cc
everyone individually.

**fanotify-executive-summary**

fanotify has 7 event types and only sends events for S_ISREG() files.
The event types are OPEN, READ, WRITE, CLOSE_WRITE, CLOSE_NOWRITE,
OPEN_ACCESS, and READ_ACCESS.  Events OPEN_ACCESS and READ_ACCESS
require that the listener return some sort of allow/deny/more_time
response as the original process blocks until it gets an event (or times
out.)  listeners may register a group which will get notifications about
any combination of these events.  Antivirus scanners will likely want
OPEN_ACCESS and READ_ACCESS while file indexers would likely use the
non-ACCESS form of these events.

groups are a construct in which userspace indicates what priority (only
really used for ACCESS type events) and what type of events its
listeners want to hear.  A single group may have unlimited listeners but
each event will only go to ONE listener.  ...

I thought the operation was usually called "mkdir" which also nicely


Ok that is foul as an interface, utterly gross. I guess it would be

That raises security and correctness questions with things like "make it
swap hard" attacks. Given that any timeout can be configured its not a
big deal. Do need to handle process death or close of the notification
descriptors.

I think the mechanism is pretty sound. There are some "how do I" cases to
do with open and watching for events when I want to rescan something as
it has been dirty for a while. I'm not sure mmap dirty properly updates
the file mtime - that wants doing anyway for backups tho so is the real
fix.

The userspace API you propoe should however be taken out and shot, then
buried with a stake through its heart, holy water in its mouth and its
head cut off, at midnight in a pentacle at a crossroads in the presence
of a priest.

The two discussions are fortunately orthogonal. Is there any reason you
can't use the socket based notification model - that gives you a much
more natural way to express the thing


		socket
		bind(AF_FAN, group=foo+flags etc, PF_FAN);

		fd = accept(old_fd, &addr[returned info])

		close(fd);

as well as fairly natural and importantly standards defined semantics for
poll including polling for a new file handles, for reconfiguration of
stuff via get/setsockopt (which do pass stuff like object sizes unlike
ioctls) and for reading/writing data.

Its not quite the same as a normal socket given you accept and get a non
socket fd with the info you need in the return address area but its much
closer than the rather mad file system proposal.

It would certainly be sane enough to, for example, start righting
scanners in stuff like python-twisted or ruby on rails (not that this is
neccessarily a good thing!)

Alan

--


So does configfs.  Eric, why not use that instead, it sounds like it
will work here nicely.

thanks,

greg k-h
--


you don't, you create a new one and unregister the old one if you want
something different.  There is no limit on the number of groups and
registered groups with nothing actively sitting there with the

I took great care in making sure the interface and the implementation
were cleanly separated.  Heck, they are even in different _user files.
I clearly remembered gregkh hating me passing binary blobs and you
suggested syscalls.  This interface was to be easily extended, quickly
prototyped, and eventually thrown away for something the list likes.
The main goal was to make sure all communication was unidirectional and
race free.  A very similar interface with syscalls could use

fanotify_control (need to think about it, register/unregister)
fd = fanotify_get_notify(%[buffer for string of metadata])

You're suggesting a malicious program attached to a listener?  Yeah,
they can do horrible things to your machine.  My thoughts were these
files are root only and selinux can easily control who can read/write

not sure what you meant by part 1.  ACCESS events require an immediate
answer.  If you want to batch up some write events and scan it with
another process that's fine.  Pass your fd to that other process and
remember the pid of that other process.  Every time you get an event
from that other process just allow it.  That other process should not
have trouble adding the fastpath entry itself.

I thought we fixed mmap updates mtime a while back.  I'll test and make


The socket model you describe works very well and cleanly to replace the
'notification' part, but I can't think offhand how to send information
nearly as cleanly back.  I guess we replace writing to access and
fastpath with setsockopt?  Now how to make those easily extensible.....


As an aside I'm trying to get some quick and dirty perf numbers.  My
scsi driver isn't loading on my test machine with hand built kernel so I
might not have any numbers till monday.

-Eric

--


An hour, a whiteboard, 3 other hackers and I think I have a handle on
something you might like a little more.

groups will be 'created' when you call bind().  the struct sockaddr will
include a priority and an event mask.  Group names will be eliminated
since priorities must be unique.  Two processes will be allowed to
bind() to the same priority if the mask is the same.  Those will be
considered to be in the same 'group.'

groups will be destroy when ALL fd's associated with that group are
closed.

calling accept() on the socket from bind will return a new fd.  This fd
will be created in the kernel using dentry_open() (just like i do it
today) only I will then try to overload and bastardize the new file to
add additional support so that this it will allow sendmsg() with flags =
MSG_OOB or setsockopt().  I'll also present an alternative below.

the struct sockaddr from the accept() will be filled with a something
like

struct fan_sockaddr {
	int version;
	unsigned int mask;
	pid_t pid;
	pid_t tgid;
	int f_flags;
}

so this will be a binary interface for metadata.  Sending the metadata
about the open fd up the sockaddrs is very slick, but not easily
extended that I can see.  Guess we need to get the metadata right the
first time.

One way to do responses from the listeners (like access decisions and
fastpath entries) would be by sending a message back down the new fd
using sendmsg(MSG_OOB).  The PF_FAN 'stuff' should be able to get this
message and do its magic.  I don't have a format for this message
thought up.  Maybe __u32 len, __u32 version, do whatever.  Maybe people
would prefer I bastardize on setsockopts() for this new fd send to the
listener?  Alternative still to come...

Some operations a listener program might want to do may not be
associated with an event.  This might include flushing all fastpaths on
a definitions update or preemptively adding a fastpath entry for an fd.
I suggest calling connect() on the bound fd to connect to the kernel
PF_FAN ...

Usual way socket stuff covers for that is to stick

	unsigned int __unused[8];


That is the normal socket approach. Eg in traditional BSD interfaces for


Not really. Lots of socket types have operations that are essentially

	fd = socket(...)
	ioctl(fd, ....);
	close(fd);

or similar. Traditionally ioctl is used for system changing stuff but
that is just tradition.

--


sending a message out for every READ/WRITE seems like it will generate a 
LOT of messages, and very few will be ones that anyone cares about.

one of the nice things about the TALPA approach was that there was an 
ability to notify only on a change of state (i.e. when a file that had 
been scanned was changed)

this could do a similar thing, but I think it would be a much more 
expensive process to do it all in userspace.

David Lang

--


On read there isn't much point anyway, on write if you simply send one,
save an event counter number and don't send another until the last one is
cleared it all works well. When the last event is cleared if another
event has occurred then the event counter will have changed so you know
to send one immediately, if the app doesn't want to receive them for a
while it can just hang onto the event for a minute or two before clearing
it.

--


Actually both read and write seems useless, as both can be bypassed by
mmap...?

								Pavel 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--


See the fastpath patch and explaination.  Doesn't help for writes...

--

Previous thread: num_possible_cpus() giving more than possible. by Steven Rostedt on Friday, September 26, 2008 - 1:22 pm. (2 messages)

Next thread: [RFC 2/11] fsnotify: pass a file instead of an inode to open modify and read by Eric Paris on Friday, September 26, 2008 - 2:18 pm. (2 messages)