The following is a file notification and access system intended to allow a variety of userspace programs to get information about filesystem events no matter where or how they happen on a system and use that in conjunction with the actual on disk data related to that event to provide additional services such as file change indexing or content based antivirus scanning. Minor changes are almost certainly possible to make this notification and access interface usable for HSMs. fscking all notify is generally refered to as fanotify. The ideas behind this code are based on talpa the GPL antivirus interface originally pioneered by Sophos and on the feedback from lkml and malware-list. This is however a complete rewrite from scratch, so if you remember talpa 'this ain't it.' the most up2date (but not always working) patch set can always be found at http://people.redhat.com/~eparis/fanotify comments, attacks, criticism, bad names, and really just about anything can be sent to me but please lets not rehash useless conversations! I will send the full patch set to both lists, but I'm not going to cc everyone individually. **fanotify-executive-summary** fanotify has 7 event types and only sends events for S_ISREG() files. The event types are OPEN, READ, WRITE, CLOSE_WRITE, CLOSE_NOWRITE, OPEN_ACCESS, and READ_ACCESS. Events OPEN_ACCESS and READ_ACCESS require that the listener return some sort of allow/deny/more_time response as the original process blocks until it gets an event (or times out.) listeners may register a group which will get notifications about any combination of these events. Antivirus scanners will likely want OPEN_ACCESS and READ_ACCESS while file indexers would likely use the non-ACCESS form of these events. groups are a construct in which userspace indicates what priority (only really used for ACCESS type events) and what type of events its listeners want to hear. A single group may have unlimited listeners but each event will only go to ONE listener. ...
I thought the operation was usually called "mkdir" which also nicely Ok that is foul as an interface, utterly gross. I guess it would be That raises security and correctness questions with things like "make it swap hard" attacks. Given that any timeout can be configured its not a big deal. Do need to handle process death or close of the notification descriptors. I think the mechanism is pretty sound. There are some "how do I" cases to do with open and watching for events when I want to rescan something as it has been dirty for a while. I'm not sure mmap dirty properly updates the file mtime - that wants doing anyway for backups tho so is the real fix. The userspace API you propoe should however be taken out and shot, then buried with a stake through its heart, holy water in its mouth and its head cut off, at midnight in a pentacle at a crossroads in the presence of a priest. The two discussions are fortunately orthogonal. Is there any reason you can't use the socket based notification model - that gives you a much more natural way to express the thing socket bind(AF_FAN, group=foo+flags etc, PF_FAN); fd = accept(old_fd, &addr[returned info]) close(fd); as well as fairly natural and importantly standards defined semantics for poll including polling for a new file handles, for reconfiguration of stuff via get/setsockopt (which do pass stuff like object sizes unlike ioctls) and for reading/writing data. Its not quite the same as a normal socket given you accept and get a non socket fd with the info you need in the return address area but its much closer than the rather mad file system proposal. It would certainly be sane enough to, for example, start righting scanners in stuff like python-twisted or ruby on rails (not that this is neccessarily a good thing!) Alan --
So does configfs. Eric, why not use that instead, it sounds like it will work here nicely. thanks, greg k-h --
you don't, you create a new one and unregister the old one if you want something different. There is no limit on the number of groups and registered groups with nothing actively sitting there with the I took great care in making sure the interface and the implementation were cleanly separated. Heck, they are even in different _user files. I clearly remembered gregkh hating me passing binary blobs and you suggested syscalls. This interface was to be easily extended, quickly prototyped, and eventually thrown away for something the list likes. The main goal was to make sure all communication was unidirectional and race free. A very similar interface with syscalls could use fanotify_control (need to think about it, register/unregister) fd = fanotify_get_notify(%[buffer for string of metadata]) You're suggesting a malicious program attached to a listener? Yeah, they can do horrible things to your machine. My thoughts were these files are root only and selinux can easily control who can read/write not sure what you meant by part 1. ACCESS events require an immediate answer. If you want to batch up some write events and scan it with another process that's fine. Pass your fd to that other process and remember the pid of that other process. Every time you get an event from that other process just allow it. That other process should not have trouble adding the fastpath entry itself. I thought we fixed mmap updates mtime a while back. I'll test and make The socket model you describe works very well and cleanly to replace the 'notification' part, but I can't think offhand how to send information nearly as cleanly back. I guess we replace writing to access and fastpath with setsockopt? Now how to make those easily extensible..... As an aside I'm trying to get some quick and dirty perf numbers. My scsi driver isn't loading on my test machine with hand built kernel so I might not have any numbers till monday. -Eric --
An hour, a whiteboard, 3 other hackers and I think I have a handle on
something you might like a little more.
groups will be 'created' when you call bind(). the struct sockaddr will
include a priority and an event mask. Group names will be eliminated
since priorities must be unique. Two processes will be allowed to
bind() to the same priority if the mask is the same. Those will be
considered to be in the same 'group.'
groups will be destroy when ALL fd's associated with that group are
closed.
calling accept() on the socket from bind will return a new fd. This fd
will be created in the kernel using dentry_open() (just like i do it
today) only I will then try to overload and bastardize the new file to
add additional support so that this it will allow sendmsg() with flags =
MSG_OOB or setsockopt(). I'll also present an alternative below.
the struct sockaddr from the accept() will be filled with a something
like
struct fan_sockaddr {
int version;
unsigned int mask;
pid_t pid;
pid_t tgid;
int f_flags;
}
so this will be a binary interface for metadata. Sending the metadata
about the open fd up the sockaddrs is very slick, but not easily
extended that I can see. Guess we need to get the metadata right the
first time.
One way to do responses from the listeners (like access decisions and
fastpath entries) would be by sending a message back down the new fd
using sendmsg(MSG_OOB). The PF_FAN 'stuff' should be able to get this
message and do its magic. I don't have a format for this message
thought up. Maybe __u32 len, __u32 version, do whatever. Maybe people
would prefer I bastardize on setsockopts() for this new fd send to the
listener? Alternative still to come...
Some operations a listener program might want to do may not be
associated with an event. This might include flushing all fastpaths on
a definitions update or preemptively adding a fastpath entry for an fd.
I suggest calling connect() on the bound fd to connect to the kernel
PF_FAN ...Usual way socket stuff covers for that is to stick unsigned int __unused[8]; That is the normal socket approach. Eg in traditional BSD interfaces for Not really. Lots of socket types have operations that are essentially fd = socket(...) ioctl(fd, ....); close(fd); or similar. Traditionally ioctl is used for system changing stuff but that is just tradition. --
sending a message out for every READ/WRITE seems like it will generate a LOT of messages, and very few will be ones that anyone cares about. one of the nice things about the TALPA approach was that there was an ability to notify only on a change of state (i.e. when a file that had been scanned was changed) this could do a similar thing, but I think it would be a much more expensive process to do it all in userspace. David Lang --
On read there isn't much point anyway, on write if you simply send one, save an event counter number and don't send another until the last one is cleared it all works well. When the last event is cleared if another event has occurred then the event counter will have changed so you know to send one immediately, if the app doesn't want to receive them for a while it can just hang onto the event for a minute or two before clearing it. --
Actually both read and write seems useless, as both can be bypassed by mmap...? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
See the fastpath patch and explaination. Doesn't help for writes... --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASKPASS even if DISPLAY is not set |
| Junio C Hamano | Re: Rss produced by git is not valid xml? |
