On 7/25/07, Andrew Morton <akpm@linux-foundation.org> wrote:
<nod> Could be what I'm noticing, but it's important to note that as
others have shown improvement with Con's swap prefetch, it's easily
arguable that targeting just swap is good enough for a first
approximation.
Yes, that's a fair transformation / generalization. It's always nice
talking to someone with more clarity than one's self.
Okay, let's run with that for argument's sake.
I've always thought your sense of humor was underappreciated.
So in your proposed scheme, would userspace be polling, er, <goes and
looks through email for maps2 stuff, only finds Rusty's patches to
it>, well, /proc/<pids>/something_or_another?
A userspace daemon that wakes up regularly to poll a bunch of proc
files fills me with glee. Wait, is that glee? I think, no... wait...
horror, yes, horror is what I'm feeling.
I'm wrong, right? I love being wrong about this kind of stuff.
Oy. I mean this in the most respectful way possible, but you're too
smart for your own good.
I mean, sure, it's possible one could have multiply-chained transient
workloads each of which have their optimum workingset, of which
there's little overlap with the previous. Mainframes made their names
on such loads. Workingset A starts, generates data, finishes and
invokes workingset B, of which the only thing they share in common is
said data. B finishes and invokes C, etc.
So, yeah, that's way too complex to stuff into the kernel. Even if it
were possible to do so, I cringe at the thought. And I can't believe
that would be a common enough pattern nowadays to justify any
hueristics on anyone's part. It's certainly complex enough that I'd
like to punt that scenario out of the conversation entirely -- I think
it has the potential to give a false impression as to how involved of
a process we're talking about here.
Let's go back to your restatement:
I'll take an 80% solution for that one problem, and happily declare
that the kernel's job is done. In particular, when a resource hog
exits (or whatever hueristics prefetch is currently hooking in to),
the kernel (or userspace, if that interface could be made sane) could
exercise a completely workload agnostic refetch of the last n things
evicted, where n is determined by what's suddenly become free (or
whatever Con came up with).
Just, y'know, MRU style.
We're talking about patching the kernel for whatever API you're coming
up with to repopulate pagecache, swap, and inodes, aren't we? If we
are, it doesn't seem like we're saving any work here. Also we're
talking about a creating a new user-visible API instead of augmenting
a pre-existing hueristic -- page replacement -- that the kernel
doesn't export and so can change at a moment's notice. Augmenting an
opaque hueristic seems a lot more friendly to long-term maintenance.
Eh, dunno. Maybe?
We're assuming we come up with an API for userspace to get
notifications of evictions (without polling, though poll() would be
fine -- you know what I mean), and an API for re-victing those things
on demand. If you think that adding that API and maintaining it is
simpler/better than including a variation on the above hueristic I
offered, then yeah, I guess we are. It'll all have that vague
userspace s2ram odor about it, but I'm sure it could be made to work.
As I think I've successfully Peter Principled my way through this
conversation to my level of incompetence, I'll shut up now.
Ray
-