Hello,
Miklos Szeredi wrote:
Event notification performance problem is usually in its scalability
not in each notification. It's nice to optimize that too but I don't
think it weighs too much especially for FUSE. Doing it request/reply
way could have scalability concerns, please see below.
> Also there's again the question of userspace filesystem messing with
That would simply be a broken poll implementation just as O_NONBLOCK
read can block in ->read forever.
> So I'd still argue for the simple POLL-request/POLL-notify protocol on
Given that the number of in-flight requests are not too high, I think
linear search is fine for now but switching it to b-tree shouldn't be
difficult.
So, pros for req/reply approach.
* Less context switch per event notification.
* No need for separate async notification mechanism.
Cons.
* More interface impedence matching from libfuse.
* Higher overhead when poll/select finishes. Either all outstanding
requests need to be cancelled using INTERRUPT whenever poll/select
returns or kernel needs to keep persistent list of outstanding polls
so that later poll/select can reuse them. The problem here is that
kernel doesn't know when or whether they'll be re-used. We can put
in LRU-based heuristics but it's getting too complex. Note that
it's different from userland server keeping track. The same problem
exists with userland based tracking but for many servers it would be
just a bit in existing structure and we can be much more lax on
userland. ie. actual storage backed files usually don't need
notification at all as data is always available, so the amount of
overhead is limited in most cases but we can't assume things like
that for the kernel.
Overall, I think being lazy about cancellation and let userland notify
asynchronously would be better performance and simplicity wise. What
do you think?
Thanks.
--
tejun
--
| Max Krasnyansky | Re: Inquiry: Should we remove "isolcpus= kernel boot option? (may have realtime us... |
| Jeremy Allison | Re: [RFC] Heads up on sys_fallocate() |
| Randy Dunlap | Re: -mm merge plans for 2.6.23 (pcmcia) |
| Damien Wyart | ACPI power off regression in 2.6.23-rc8 (NOT in rc7) |
git: | |
| Josip Rodin | Re: bnx2_poll panicking kernel |
| Linus Torvalds | Re: [GIT]: Networking |
| Denys Fedoryshchenko | thousands of classes, e1000 TX unit hang |
