Re: [malware-list] scanner interface proposal was: [TALPA] Intro to a linux interface for on access scanning

Previous thread: BUG: checkpatch by Jaswinder Singh on Monday, August 18, 2008 - 7:20 am. (2 messages)

Next thread: [PATCH] powerpc, scc: duplicate SCC_UHC_USBCEN by roel kluin on Monday, August 18, 2008 - 3:06 pm. (3 messages)

I have one word for you --- bittorrent.  If you are downloading a very
large torrent (say approximately a gigabyte), and it contains many
pdf's that are say a few megabytes a piece, and things are coming in
tribbles, having either a indexing scanner or an AV scanner wake up
and rescan the file from scratch each time a tiny piece of the pdf
comes in is going to eat your machine alive....

						- Ted
--

From: tvrtko.ursulin
Date: Monday, August 18, 2008 - 8:31 am

Huh? I was never advocating re-scan after each modification and I even 
explicitly said it does not make sense for AV not only for performance but 
because it will be useless most of the time. I thought sending out 
modified notification on close makes sense because it is a natural point, 
unless someone is trying to subvert which is out of scope. Other have 
suggested time delay and lumping up.

Also, just to double-check, you don't think AV scanning would read the 
whole file on every write?

--
Tvrtko A. Ursulin
Senior Software Engineer, Sophos

"Views and opinions expressed in this email are strictly those of the 
author.
 The contents has not been reviewed or approved by Sophos."
 

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon,
OX14 3YP, United Kingdom.

Company Reg No 2096520. VAT Reg No GB 348 3873 20.

--


You need a bit more than close I imagine, otherwise I can simply keep the
file open forever. There are lots of cases where that would be natural
behaviour - eg if I was to attack some kind of web forum and insert a
windows worm into the forum which was database backed the file would
probably never be closed. That seems to be one of the more common attack

So you need the system to accumulate some kind of complete in memory set
of 'dirty' range lists on all I/O ? That is going to have pretty bad
performance impacts and serialization.
--

From: tvrtko.ursulin
Date: Monday, August 18, 2008 - 8:58 am

No, I was just saying scanning is pretty smart, it's not some brute force 
method of scan all data that is there. It has a file type detection and 
what and how to scan is determined by that. If a file does not resemble 
any file type I don't think it gets scanned. For example take couple of 
gigabytes of zeros and try to scan that with some products. I don't think 
they will try to read the whole file.

--
Tvrtko A. Ursulin
Senior Software Engineer, Sophos

"Views and opinions expressed in this email are strictly those of the 
author.
 The contents has not been reviewed or approved by Sophos."
 

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon,
OX14 3YP, United Kingdom.

Company Reg No 2096520. VAT Reg No GB 348 3873 20.

--


trying to include details of where each file was updated means that you 
can't just set a single 'dirty' flag for the file (or clear the 'scanned' 
flags), you instead need to detect and notify on every write.

this is a HUGE additional load on the notification mechansim and the 
software that recieves the notifications.

just sending "fix X was scanned and now isn't" is going to be bad enough, 
you _really_ don't want to do this for every write.

David Lang
--

From: David Collier-Brown
Date: Monday, August 18, 2008 - 6:42 am

I suspect we're saying "on close" when what's really meant is
"opened for write". In the latter case, the notification would tell
the user-space program to watch for changes, possibly by something as
simple as doing a stat now and another when it gets around to 
deciding if it should scan the file. I see lots of room for
user-space alternatives for change detection, depending on how much
state it keeps. Rsync-like, perhaps?

--dave
-- 
David Collier-Brown            | Always do right. This will gratify
Sun Microsystems, Toronto      | some people and astonish the rest
davecb@sun.com                 |                      -- Mark Twain
cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191#
--


Or more precisely perhaps "on the file becoming dirty". A program that
opens for write, computes for an hour and writes out doesn't want to load
events down until it begins writing.

I agree "on close" is inaccurate for the scanner cases and that is why

Agreed.
--


trying to have every scanner program monitor every file that any program 
opens for write by doing periodic stat commands on it sounds like a very 
inefficiant process (and unless they then get notified on close as well, 
how do they know when to stop monitoring?)

getting a notification on the transition from scanned -> dirty is much 
less of a load (yes, it does leave open the possiblilty of a file getting 
scanned multiple times as it keeps getting dirtied, but that's a policy 
question of how aggressive the scanner is set to be in scanning files)

David Lang
--


Make this a userspace problem.  Send a notification on every mtime
update and let userspace do the coallessing, ignoring, delaying, and
perf boosting pre-emptive scans.  If someone designs a crappy
indexer/scanner that can't handle the notifications just blame them, it
should be up to userspace to use this stuff wisely.

Current plans are for read/mmmap to be blocking and require a response.
Close and mtime update and going to be fire and forget async change
notifications.  I'm seriously considering a runtime tunable to allow the
selection of open blocking vs async fire and forget, since I assume most
programs handle open failure much better than read failure.  For the
purposes of an access control systems (AV) open blocking may make the
most sense.  For the purposes of an HSM read blocking makes the most
sense.

Best thing about this is that I have code that already addresses almost
all of this.  If someone else wants to contribute some code I'd be glad
to see it.

But lets talk about a real design and what people want to see.

Userspace program needs to 'register' with a priority.  HSMs would want
a low priority on the blocking calls AV Scanners would want a higher
priority and indexers would want a very high priority.

On async notification we fire a message to everything that registered
'simultaneously.' On blocking we fire a message to everything in
priority order and block until we get a response.  That response should
be of the form ALLOW/DENY and should include "mark result"/"don't mark
result."

If everything responds with ALLOW/"mark result" we will flip a bit IN
CORE so operations on that inode are free from then on.  If any program
responds with DENY/"mark result" we will flip the negative bit IN CORE
so deny operations on the inode are free from then on.

Userspace 'scanners' if intelligent should have set a timespace in a
particular xattr of their choosing to do their own userspace results
caching to speed up things if the inode is evicted from core.  ...

No can do - you get stuck with recursive events with the virus checker



file handle. Really you need to give the handle of the object because it
may not have a name or a meaningful inode number

--

From: douglas.leeder
Date: Monday, August 18, 2008 - 9:54 am

And the opposite approach can't work because the AV scanner + the index 
scanner 
need the HSM to do its work before they can scan.

I guess the only way it could work is to have levels:
e.g.
HSM agent is Level 1
AV scanner is Level 2
Index scanner is Level 3

When you register at Level N, you are excluded from all blocking/scanning 
at Levels >= N,
but your ops are still passed to Level < N.

An example is a little hard to craft because HSM and indexing catch 
different operations. :-)
 
-- 
Douglas Leeder

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon,
OX14 3YP, United Kingdom.

Company Reg No 2096520. VAT Reg No GB 348 3873 20.

--


Something like that would do the trick for any simple dependencies yes.

Alan
--


My last interface was single leveled and was able to efficiently stop
recursion by simply excluding all processes which were scanners.  It was
implemented as a flag in the task_struct.  I could probably go the same
route and just exclude all kernel initiated scanners from all scanning
operations.  I also included an interface for a process to be completely
excluded, but given multi-level scanners I don't think an 'exclude all'
is appropriate.

I could add a separate interface for background/purely userspace
scanners to register their level and only call scanners from the kernel
with a lower level.  Not sure what security I'd want to put around this

the single in core allow/deny bit is so that the vast majority of
operations are completely free.  Say we scan/index /lib/ld-linux.so.2
once.  Do you really want every single read/mmap operation from then on
to have to block waiting for the userspace caches of you HSM, your AV
scanner, and you indexer?  If all three tell the kernel they don't need
to see it again and that information is easy and free to maintain, lets

I think I'm going to stick with my special file in securityfs since it
makes it some simple to install the fd in the scanning process (as
opposed to netlink where I don't even know how it would be possible...)

-Eric

--


this is why the proposal caches the results of all the scanners with the 
file (in the xattrs), rather then having each scanner store it's own scan 
results

David Lang
--


AF_UNIX passes file handles just fine. I'm not sure netlink will help you
here anyway - isn't it lossy under load ?

Also securityfs is more special purpose magic here - what does it have to
do with a general purpose notifier API ? I'd actually generalise the
notifier properly and go for a syscall.

Alan
--


But the file being installed needs to be at least RD for AV/Indexer.
Particularly of interest to people here would be a file opened O_WRONLY
and then the indexer wouldn't have the ability to read the data that was
just written.  So we need a new FD, can't just send the old one.

I'd also assume that an HSM would need a WR file descriptor, which isn't
easy.  I've found that (through trial and error not understanding the
code) trying to make new descriptors for the new process have WR often
returned with ETXTBUSY....

I think I might just give RO file descriptors and if an HSM comes along

Well, securityfs is really just a location for a bunch of interfaces.
The real 'magic' is that I defined my own read and write functions on a
special inode.

Lets assume a new syscall (and I know you tried to describe this too me
before but I can't remember it and I'm having some e-mail history
trouble) what would it look like?  A scanner constantly calls scan() to
block for data to be scanned?  So an AV, HSM, or indexer all would be
blocking in scan() just waiting for data?  How do they respond?  How is
it better, cleaner, or more general than a 'special' file they
read/write?

-Eric

--


Also not knowing much about sending FD's over AF_UNIX sockets, do they
share the same seek offsets or does the new process get a new fd which
points to the same data?  I wouldn't want to have to count on the
indexer to not move the offset around on the bittorrent client.  Like I
said, haven't never used sendmsg to pass a socket I don't know what you
get on the other end.

-Eric

--


man pread(). The posix committee long ago figured out that you needed a
sane way not to get involved with offsets even between processes or in
threaded apps.

Alan
--


The devil is in the details, and besides everyone trying to heap other
things on, one thing that keeps getting brought up, and seemingly keeps
getting ignored is the fact that there already is a perfectly reasonable
interface to pass file system events (open, close, read, write, etc) to
userspace applications in the form of FUSE which has already in some
ways solved issues wrt. subtle deadlocks that can happen when you bounce
from an in-kernel context to a userspace application.

Fuse is definitely the way to go for HSM. But even for one of the
various threat models I've read in the past couple of days it would be
perfect. i.e. not allowing Linux servers to be used as a means to
propagate viruses for other machines.

The trick is to have a scanned view on the file storage though a FUSE
mount, and then have samba/knfs/apache/etc. export only the fuse mounted
tree or chroot the daemons under the scanned part of the namespace. This
provides an excellent way to separate 'trusted' applications from
non-trusted by leveraging the namespace. In fact the raw data can easily
be stored in such a way that it is owned and accessible only by fuse's
userspace process (and root) so that even without chroot, local users
can only access the data through the fuse mount/scanning layer.

And the kernel parts are already implemented, doesn't require new
syscalls, or placing policy about which processes happen to be
'priviledged' in the kernel and solves several nasty deadlocks that can
happen when you start blocking processes in their open, close, read,
write or page faulting code paths.


They all block at different places because they all have very different
requirements.

HSM blocks in open before the file data is present because that still
needs to be fetched. AV scan blocks after the file data is accessible
but before returning to the application and the indexer only cares about
being notified after a open for write/mmap releases the last (writing)
reference to the file, since it seems to ...

Can you help me write/prototype something that will work for every
regular file anywhere on the system including the kernel binary
in /boot, the glibc libraries in /lib/ld-linux.so, /sbin/ldconfig and
every file on every USB stick you put into the machine?  When all of
these are on separate partitions?  Every file under / needs to be
exported to the scanner.  I'm very willing to believe fuse is the way to
go for an HSM, but I don't see how to get every single file on the
system through the FUSE based scanner.

Yes propagation is an important use of file scanning (maybe the
biggest), but we clearly can't secure every part of the border, and I
don't know how to use fuse to do it all rather than just pieces and
parts.

You're absolutely right about this thread droning on.  But I've got code
that solves the problems.  If someone else shows me better code rather
than talk I'm all for it!

-Eric

--


the issue is that the kernel developers are not that interested in 
creating one-off interfaces for anti-virus scanners. If the interfaces are 
more general and able to be used for a wider variety of problems they are 
much more interested in having them implemented.

unfortunantly you went off and developed a bunch of code before talking to 
people about what the appropriate interfaces would look like, (this is a 
common problem, see the 'how to participate in the kernel' document at 
http://ldn.linuxfoundation.org/book/how-participate-linux-community)

David Lang
--


Having talked with Eric a bit more it looks like we have some fairly 
fundamentally different views on the scope of how this sort of thing would 
be used and that is causing us to talk about different things.

we are both looking at the threat model of trying to provide hooks so that 
data is scanned before allowing programs to use it (neither of us is 
talking about trying to do LSM things). where we differ is the uses we 
expect the hooks to be put to.

please note that I am trying to state Erics position, I may be mistaken.

Eric is viewing this through the AV point of view,
   this means

He doesn't expect there to be many scanners on a system
   initally he was only thinking of one.

He expects the interactions between the scanners to be simple (i.e. all 
scanners must bless a file or it's considered bad)
   the policy is simple and will always be the same

He expect AV signatures to change rapidly
   so storing the results of scans is of very limited value

He is expecting the scanning software to set the policy
   so there is no reason to have a system/distro defined policy

He is thinking that any ability to avoid doing the scan is a security 
hole.


these things are leading him to the kernel-based implementation that he 
posted.



I am seeing things (I think) a bit more broadly (definantly differently).

I think that the availability of a general 'this file was written to' 
interface in the kernel combined with 'take action before opening' will 
lead to many uses beyond AV work.
   these include things like filesystem indexers, HSM systems, backup 
software as well as security scanners (IDS, 'tripwire', as well as AV)

I expect to see IDS type scanners, possibly multiple ones on a machine, 
each fairly simple and not trying to do a complete job, but authoritative 
within it's area.
   this means that the interaction between approvals is more complex and 
not something that should be coded into the kernel, it should be 
configured in ...
From: Eric Paris
Date: Wednesday, August 20, 2008 - 8:15 am

Not quite.  I believe it should be the responsibility of the scanner to
determine how and if they want to store the results of the scan.  I'm
willing (and want) to provide a simplistic kernel fast path if all of

I'm not sure of the definition of this 'policy' but, yes, I think all

At the moment I'm leaning towards a separate async notification system
for open/mtime change/close which will be a fire and forget notification
system with no access control mechanism.

A second, although very similar, mechanism will block on read/mmap
(although I'm not so sure how to handle O_NONBLOCK without a kernel
fastpath/inode marking that actually gets used, this is really a serious
design issue with putting this at read/mmap.  I don't think we are ready
to give up on O_NONBLOCK for S_ISREG altogether just yet are we?) and
provide access control.  I also plan to include open() in the
blocking/access control based on a /proc tunable.  If applications can't
handle EPERM at read/mmap they can get it at open and suffer the

I don't understand how something can be 'authoritative within it's area'
and still have a 'complex interaction policy.'  I see it as, either yes
means yes and no means no, or it isn't authoritative.

If two scanners need some complex interaction it certainly needs to be
in userspace, no question there.  Sounds like a dispatcher daemon needs
to get the notification from the kernel and send them to the scanners
and then let it do all sorts of magic and sprinkling of pixie dust
before the daemon return an answer to the kernel.  In the end that
deamon is the authoritative thing.  I don't plan to write it since I
don't see the need, but the possibility of writing a dispatcher daemon
certainly exists if there is actually need.

Everything says yes at read/mmap we allow.  Anything says no we deny.
You need more than that write an intermediary daemon in userspace to

My answer is that if they want to store whatever it is they care about
across boots so the scanner can write ...

as an example.

if the system package manager says the syslogd binary doesn't match the 
checksum that it has recorded should it be prevented from running? (a 
strict policy would say no, but the sysadmin may have recompiled that one 
binary and just wants a warning to be logged somewhere, not preventing the 
process from running)

what happens if scanner A (AV scanner) says that a binary has a virus in 
it, but scanner B (IDS scanner checkins checksums) says that it's the 
right version? what mechanism do you have to say that a yes from scanner B 

that could work, the need to have the userspace daemon to do the more 
complex things was part of what was pushing me to think in terms of 
userspace hooks for open/read/mmap/etc instead of kernelspace hooks 
(avoiding the context switches you mentioned in an earlier message becouse 

without the kernel support to clear the flags when the file is dirtied how 
can these programs trust the xattr flags that say they scanned the file 
before?

you also mention using mtime, I don't think that's granular enough. we 
already have people running into problems during compiles with fast 
machines not being able to detect that a file has changed by the mtime.

I'm not saying that xattr is the only way to store the info, it just seems 
like a convienient place to store them without having to create a 
completely new API or changing anything in on-disk formats.

the real requirements that I see are more like this

1. must be able to be cleared by the kernel when the file is dirtied

2. must be able to be persistant across reboots

3. should allow free-form tags to be stored by scanners

4. if it's deemed nessasary to close the race condition of a file getting 
modfied while the scanner is scanning it, there should be an 'atomic to 
userspace' call to set a tag IFF an old tag exists. This is a new API 
call, but would only need to be used by the scanners.

while #3 can cause conflicts between scanners, I don't expect that in ...
From: Eric Paris
Date: Wednesday, August 20, 2008 - 12:26 pm

My belief is that if you choose to run a file scanner and that file
scanner gets the answer wrong you need to look at the file scanner.
There shouldn't be arbitrary overrides.  If you don't accept the results
of the scanner what's the point?  Tell you package manager scanner that

I don't understand what you mean about trust.  This is an argument for
kernel support now?  What is it that you say needs and what doesn't need

And I saying we don't actually need any of this and if it is actually
needed by someone in the real world they can easily build their own
solution on top of my generic interface.  I'm not making the assertion
it is race free and don't think it is possible without making every
sequential (hahahaha.)  But I claim in the face of normal operation it's
fine.  My interface, as proposed, is very generic.  Much more so than
what I think you are trying to describe.  I couldn't make mine more
minimal or broad.

--


and this is the core disagreement we have. I don't trust the AV vendors 
that much. I want there to be some way for me to disagree with them.

I've had AV false positives a few too many times where it flagged critical 

if a program set an xattr to say that it scanned the file and then the 
system reboots, how can this program know that the file hasn't been 
modified since that xattr was set? your in-memory data is gone, so you 

1. a flag mechansim (namespace in xattr or something else) that allows for

1a. scanners to store arbatrary tags in a way that will survive a reboot.

1b. when the file is dirtied the kernel clears all flags that have been 
set on this file.

   especially for mmap access the kernel is in a good position to detect 
that the file has been dirtied, but nothing else is.

1c. when the kernel detect a formerly clean file getting dirtied it sends 
a message to userspace in a way that multiple scanners can receive the 
alerts

1d. to close the race of the file being modified while it's being scanned, 
add a system call (atomic as far as userspace is concerned) that sets a 
new tag IFF an old tag is set

2. on access a check is done against a list of 'required' tags, if not all 
tags are present the master scanning logic is invoked.

   for several reasons I've been thinking that this step could/should be 
done 
in userspace

If done in userspace

2a. define a place to record the 'required' tags.

   one way to do this is to have a directory for it, programs define 
'required' tags by creating a file with that as it's name with the 
contents of the file including the scanner name and the command line to 
execute to perform a scan

2b. define a way of stacking different scanners (being able to define what 
to do if each scanner says "yes", "no", "I don't know", "the file changed 
under me", and "the file changed under me, but I think I found a problem", 
and what to do with the combination of different answers).

   it may be that something ...

I realized I need to reply to this part just after hitting send on the 
reply to the rest of it.

part of the policy that needs to be set is when scans do and don't need to 
be done.

you almost never want to have 'scans' take place when scanners access 
files (the HSM restore is the only exception), and there are significant 
performance benifits in exempting other programs as well.

you are saying that the decision of which programs to skip and which ones 
to not skip should be the responsibility of the scanner. I disagree for a 
couple of reasons

1. I don't think that the scanner can really know what program is trying 
to do the access.

2. I think the policy of which files to limit to scanned data and which 
ones to allow access to unscanned data should be a sysadmin decision 
(assisted by the distro), not something set through the scanning software. 
In sort I don't trust Symantec, Macafee, etc to make the correct decisions 
for all the different linux distros out there, or for the different 
scanners to provide sane, consistant interfaces to specify this sort of 
thing. I expect each of them to take the attitude that they know what's 
best, and hard-code the policy with little (if any) allowance for 
exceptions, and that exception list would be managed differently for each 
scanner.

David Lang
--

From: douglas.leeder
Date: Thursday, August 21, 2008 - 7:35 am

I think these are excellent ideas. 

The kernel really does have to keep some record if it's going to do any 
scanning from read() calls, it can't go to userspace each time to check
if a file is cached.
(It might be the single open file descriptor that's marked though)

O_NONBLOCK is then handled nicely, and we can avoid ever blocking that 
client process (which given they're trying non-blocking IO is probably 
a good thing).

-- 
Douglas Leeder



Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon,
OX14 3YP, United Kingdom.

Company Reg No 2096520. VAT Reg No GB 348 3873 20.

--


if you say that the hooks are in the kernel then it does make sense to try 
and have the checking of the cached state done in the kernel.

this will mean that the kernel will have to have some way of knowing which 
programs it should block when the check fails and which ones it should let 

I don't see how this follows.

David Lang
--

From: Pavel Machek
Date: Friday, August 22, 2008 - 8:09 am

That's contrary to the threat model ('it is just a scanner').

(Plus you can't do it. mmap. Of course you can pass viruses between
two cooperating applications... and you can do it through filesystem,
too. And you probably can make un-cooperating network server serve
viruses, as long as the network server uses mmap.)

This is the thing that makes antivirus ugly, its unique to the
antivirus, plus it can't be done. I.e. bad goal.


							Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--


by the way, sendfile and splice will probably also cause grief (or at 

the items that I see as the potentially difficult policy decisions

1. when to scan files on access

and the more dificult issue,
2.when to allow access to unscanned files

3. what to do if different scanners disagree with each other


I think Eric's answers would be

1. unless they are already marked as being scanned since rebooting

2. only when a scanner program is doing the access, unless the scanner 
programs all decide differently.

3. only allow access if all scanners agree.


My answers are

1. unless they have already been marked by the current generation of 
scanner signatures

2. depends wildly on the environment. some uses will want to follow Eric's 
very strict policy, others will only want to impose the on-access scanning 
on software expected to be exposed to windows clients, yet others will 
want to scan by default, but exempt programs that don't interpret their 
input (for example 'wc')

I also don't know how the kernel could reliably figure out what program is 
asking for access. I guess you could try to do something with SELinux 
tags, but that makes this system dependant on SELinux, plus since you can 
only have one tag on the program it will potentially double the number if 
unique tags on the system, with a significant complication to the ruleset 
to make each of the tags identical, except for this one function.

3. like #2 depends wildly on the environment and what scanners are in use. 
I could easily see a 'majority vote wins' with three (or more) AV scanners 
in use. I could also see having a checksum based scanner override the 
decision of a heristic based scanner

David Lang
--


...

Going back to your own email,

| From: Eric Paris <eparis@redhat.com>
| Date: Wed, 13 Aug 2008 12:36:15 -0400
| Message-Id: <1218645375.3540.71.camel@localhost.localdomain>
| Subject: TALPA - a threat model?  well sorta.
...
| The value of a file scanning interface is not in stopping active
| attacks.  Its in making the Linux platform a more difficult location for
| the storage and propagation of bad data.  I think its reasonable to
| think that we all agree we don't want to be the preferred hosting
| platforms for trojan binaries intended to attack other non-Linux
| systems.  Why would one want consciencely choose to leave Linux as a
| safe haven for the existence of malware?  Even though the malware is not
| attacking the Linux platform do we as the Linux community really want to
| be a breeding and hosting ground as long as the costs are not too high?
...

This is the threat model I addressed in my email. So now you change the
model to something where the malware is in fact attacking the Linux

Have a modified initrd which contains fuse and the scanner and when
mounting the root file system also start the scanner + fuse mount, and
then instead of pivoting into the root-fs, pivot into the fuse mounted
one.

Of course that doesn't yet deal with non-root disks or external devices
that are mounted later on. This would require a modified mount sequence
so that the mount action is completed by a daemon that is running in the
trusted root next to the scanning process so that the new mountpoint is
correctly placed underneath the scanning layer. Maybe it it possible to
start a new fuse/scanning process for every mount as well, but then you
may get some things scanned multiple times because the scanner in the
lower layer ends up having to verify/validate the actions of the
scanners in higher layers.

But at some point it really is just an excerise in who or what you
trust. Because if you at least are willing to trust that the root disk
has not been compromised and the ...

why do you need to introduce a priority mechanism for notifying scanners? 
just notify them all and let them take action at their own rate for async 
notifications. if you are waiting for their result you need to invoke them 
directly, but whether you do it in order or in parallel will depend on the 
config. this is an optmization problem that the kernel should not be 
trying to figure out becouse the right answer is policy dependant (if they 
all need to bless the file for it to be accepted then fire them off in 
parallel (system resources allowing), if one accepting it is good enough 
you may want to just run that one and see if you get lucky rather then 



you can't trust timestamps, they go forwareds and backwords. they need to 
have some sort of 'generation id' but don't try to define meaning for it, 
leave that to the scanner. have everything else treat it as a simple "it 

you keep planning to do this with a single allow mark. it may not be that 

as several others have noted, alerting on close is not good enough, we 
need to alert on the scanned->dirty transition (by the way, this 
contridicts the part of your message I snipped where you were advocating 

if you are already accessing xattrs, why not just use the value rather 

having scanners access a file blocking on read won't work for multiple 

this is easy, the userspace library (libmalware or glibc) intercepts the 
open and is invoking the scanners if the checks tell it to. they can send 
the file descripter over a unix socket on the machine to a scanner daemon, 
or they can invoke the scanner in the existing user context.

David Lang
--


You have some pretty serious reading comprehension problems with my
message.  Try reading it all over again keeping in mind (although not
stated it was I thought understood) that the priority was only of value



Not my problem.  Userspace needs to make their own determination and
cache their own results from async scans.  Kernel fires and forgets on
async.  Its up to userspace to make those notifications useful if they

Is it really that hard to understand what I'm saying?  We notified on
mtime update and cleared the "mark result".  Why shouldn't we notify on

I'm not accessing anything.  I'm leaving xattrs as an exercise of
efficiency for people who want to write a userspace scanner.  Not my


But, I have code for my solution that addresses just about every problem
mentioned on list so far except for multi priority blockers.  Where is
your code?

-Eric

--


if you have multiple things reading from a pipe only one will get any one 
message, right?

David Lang
--


S_ISREG()  ?   We're talking about file scanning not communication
interception.  Arjan strongly wanted me to push this down to read, but I
still planned to not look at lnks, dirs, chars, blocks, fifos and
sockets...

--


if it doesn't read the entire file and only reads the parts that change, 
out-of-order writes (which bittorrent does a _lot_ of) can assemble a 
virus from pieces and the scanner will never see it.

as for Ted's issue, the scanner(s) would get notified when the file was 
dirtied, they would then get notified if something scanned the file and it 
was marked dirty again after that. If nothing got around to scanning the 
file then all the following writes would not send any notification becouse 
the file would already be dirty.

David Lang
--

From: tvrtko.ursulin
Date: Tuesday, August 19, 2008 - 1:40 am

No, it would catch it once it gets assembled. It doesn't read the parts 

This sound like a good strategy.

--
Tvrtko A. Ursulin
Senior Software Engineer, Sophos

"Views and opinions expressed in this email are strictly those of the 
author.
 The contents has not been reviewed or approved by Sophos."


Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon,
OX14 3YP, United Kingdom.

Company Reg No 2096520. VAT Reg No GB 348 3873 20.

--


Why do you think non-malicious applications won't write after close /
keep file open forever?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--


If you ask this one more time without reading the many times I've
answered these questions I think I'm going to explode.

Permissions checks are done on open/read.  Decisions are invalidated at
mtime update, which INCLUDES mmap after close!  I don't care if you keep
your file open forever, if you wrote to it, we are just going to scan it
next time a process decided to open/read it.  Please stop confusing this
already long and almost pointless thread with implementation details
that have repeatedly been explained.

-Eric

--


You will see latter where what you just said fails and its issue is
preventable too downloader with build in previewer.

Funny enough solution to this is fairly simple.   But does require
looking at a white list methods and LSM.

Two major ways.    White list format check method tells you that file
is not complete enough so black list scanning is not required yet.  Ok
lighter than running 5000 black list signatures over it each time a
new block gets added.

Unfortunately scan of all pieces as them come in for possible threat
still has to be done reason on videos and the like people in some
applications start playing them before the download is complete.
Lots of video and audio formats have blocks that can be cleared piece
by piece.   Just like a bittorrent can scan and pass block by block so
does a white list scanner need to be able to.  More creative use can
be void block insert into some formats.   Ie part downloaded block at
X replaced with a equal to blank block to the player  that is external
to the download tool.   This is damaged data access prevention.
Nicely prevents some stability issues and gives users extra features.

White list scanner knowing the format can detect when enough segments
are in a file to run a black list scan avoiding jack hammering the cpu
eating black list scan.

Dealing with bittorrent clients with built in preview is a pain in the
you know what.   Since are they reading the file to send to someone
else are they reading the file to display in there internal viewer or
do they take straight from there download buffer to internal view.
Even worse lots of bittorrent streams are encrypted and cannot be
scanned while network packets.   So second solution required a LSM
around the downloader preventing it in case of breach being able to go
anywhere in the system.   LSM only allows access to files that the
downloader has downloaded by other applications with more rights when
its pasted White list and needed black list scanning.

Getting this to work ...
From: douglas.leeder
Date: Tuesday, August 19, 2008 - 1:09 am

You seem to have some very funny ideas about what white-listing and 
black-listing 
scanners do.

Checking filetypes and checking for complete/non-corrupt files is 
something
black-listing scanners do.

Where-as whitelisting: 
"An emerging approach in combating viruses and malware is to whitelist 
software which is considered safe to run, blocking all others"

While ensure media files are complete could be done by a scanner that

So?

We not talking about throwing away LSM, or replacing it in any way.

This discussion is about an additional scanning path, for files, for any 
kind of content-based 


The thing is Windows has had built-in white-listing for a long
time, and yet there is still a market for AV scanners, this suggests 
people don't like white-listing.

Also consider all of the problems and criticism Vista's UAC has had. And 
UAC is 
only white-listing privileged operations. 

-- 
Douglas Leeder

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon,
OX14 3YP, United Kingdom.

Company Reg No 2096520. VAT Reg No GB 348 3873 20.

--

From: Peter Dolding
Date: Tuesday, August 19, 2008 - 4:08 am

Pure Black List don't.   Pure Black List is looking for a known threat
that is the end of there skill.

Lot of current black list scanners are hybred.   They are using known
format white list sorting there databases.   Basically they are using
You are missing something critical.   Really critical.  There are such
things as Heuristic White List scanner.

Heuristic White List scanners are something you use on document
archives or for faster sorting out of a system breached by a unknown
somewhere in a stack of document files.

Major diff between Heuristic Black List and Heuristic White List.   Is
that Heuristic White List does not contain threats.   Instead
Heuristic White List contains knowledge about the formats is
processing and allowed exceptions alone.    So a doc file containing a
macro would be thrown out as a threat by a Heuristic White List unless
the Macro in the doc is included in the list of exceptions.

You find that Heuristic White List's are operating inside a lot of the
black list systems.   Reason with out it there threat processing would
simply unable to be done.

The damaged file detection in current day AV's is not black list its
white list people have mixed the two techs and want to forget the
divide.

Forced to used yet then they wrap what the white list system finds
away from the user.  Basically on the idea if they give you notice you
will do the wrong thing.  Anti-Virus companies are basically taking
the attitude to white lists is use but never show to user.

Truly operating Heuristic White List knowing file formats when it
detects something questionable gives users a few options.   Access
document without threat ie produce new copy of doc with macro striped,
 Run past a black list for that kind of threat, quarantine or delete
it.   Current AV auto set the run past black list from their Heuristic
White List.

This current auto pathing into black list has quite a few downsides.
 Black List section is going to get longer so one day processing ...
From: douglas.leeder
Date: Monday, August 18, 2008 - 9:28 am

What size is a tribble? :-)

If we assume that the bittorrent client is closing and re-openning the 
file 
each time it's got a nice piece of the file? (Otherwise I don't think
we'll have a performance problem)

Then there maybe room for a optimisation of the following form:
For a file X.
If X is only a local disk.
If X was written from empty by process A and only process A.
Then don't scan attempts to open by process A.

But that sort of optimisation can either be done in user-space, or in a 
future 
kernel modification.

I haven't fully analysed this - it assumes that reading data into process 
A, that
process A wrote out is safe, regardless of the data. 

-- 
Douglas Leeder

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon,
OX14 3YP, United Kingdom.

Company Reg No 2096520. VAT Reg No GB 348 3873 20.

--

Previous thread: BUG: checkpatch by Jaswinder Singh on Monday, August 18, 2008 - 7:20 am. (2 messages)

Next thread: [PATCH] powerpc, scc: duplicate SCC_UHC_USBCEN by roel kluin on Monday, August 18, 2008 - 3:06 pm. (3 messages)