Re: demand-loding etc

Previous thread: Linux vs. 486/33 by John T Kohl on Saturday, November 16, 1991 - 4:03 am. (2 messages)

Next thread: alternative to paging by Peter MacDonald on Saturday, November 16, 1991 - 9:07 pm. (1 message)
To: <linux-activists@...>
Date: Saturday, November 16, 1991 - 8:28 pm

I finally bit the bullet. Yup, I blew away my dos partition and
put Linux on it. Now I would like to start a project. I consider
virtual consoles to be high priority, but until init/login etc is
done, there is probably no use starting that.

Thus I am proposing to look at demand paging from the file system.
If Linus agrees to consider adding it to linux when it is done,
and nobody successfully shoots this proposal down, I will start
tuit suite.

If someone else wants to help (or do all of it) let me know.
I have broken it down into phases to clarify understanding,
not necessarily to imply they might be released in this order (if ever).
If you think this is a house of cards, let me know ASAP.

Proposed Design:

Phase 1:
- Upon loading an executable, create a map that is stored in the
process that locates all blocks on disk. Do not look at fs again.
- Load only the first 4K page and execute.
- Upon a code page fault load the required 4 blocks into ram.
- Make no attempt to lock file image (count on seg violation?)

Phase 2:
- Attempt to share executable images in ram (shared-text).

Phase 3:
- Attempt to implement the stickey bit, to pin an executable
in memory once loaded.
- Find a way to flush it (all) from memory when done.

Phase 3:
- Attempt to manage working sets in memory if data requirements
exceed available ram (down to ~15%).

Phase 4:
- Paging (writing) data to a partition or fixed size file.
- Locking paged image file.

Issues:

- Allocating/deallocating memory for the program maps.
- Enable/disable paging when booting from shoelace?
- Do not use working set with pinned pages?
- File locks held in ram only?

To: <linux-activists@...>
Date: Saturday, November 16, 1991 - 2:09 pm

One thing could be a problem with demand-loaded executables: finding
the page on disk means going through the file system, whereas paging
from a dedicated swapping partition could be a simpler and faster
operation. Admittedly, the first read must be done from the
executable, and demand loading from the file probably makes the load
process seem faster.

--
Johan Myreen
jem@cs.hut.fi

To: <pmacdona@...>
Cc: <linux-activists@...>
Date: Saturday, November 16, 1991 - 11:47 pm

Everyone should note that this is not the original meaning of the sticky
bit, in the BSD 4.3 sense. In the BSD sense, what the sticky bit means
is that after the program exits, its entry in the text table is not
purged and its swap space is not reclaimed. The program was _not_
locked into memory, and would get swapped out if demands were placed on
the VM system. The rationale behind this is that in a time-sharing
environment, programs like GNU emacs are almost in use by *someone*, and
in the case where the last person finishes running emacs, the program
should not be flushed from the text cache, swap space, and ram, in the
hope that another user would be starting an emacs before the unused
program got swapped out to disk. In that case, the user would win since
no disk I/O would be needed. However, in a single-user environment,
the value of the sticky bit is rather dubious, and Project Athena
workstations don't even bother using it for this reason.

That being said, I'm not too sure that pinning an executable into memory
is a good idea. Unless you have gobs and gobs of memory, you wouldn't
be able to "sticky bit" more than one or two programs before you run out
of physical memory, and in the meantime, locking down large amounts of
memory would increase the amount of VM thrashing when other programs
are running.

A better idea might be to not flush the text table entry once its ref
count goes to zero, but to rather wait until the last of its text pages
are paged out (which will happen sooner rather or later because there is
no program using the text). However, if the text is used by a process
before the last pages are reclaimed then you will win because some
number of pages won't need to be brought back in from disk. Thus, this
system will do the right thing for someone who has enough memory to keep
gcc, make, etc. in core at the same time during a build. The advantage
that this scheme has is that it will automatically adapt to whatever
text image is being repeatedly used ...

To: <Linux-activists@...>
Date: Sunday, November 17, 1991 - 6:36 am

You don't start small, do you :-). If I /agree/ to add it to linux? If
anybody implementes paging, he's going to get 2 extra copies of linux
for free. How's that for an offer?

Seriously, adding demand-loading should be relatively easy. I wouldn't
suggest going past the filesystem, even for saving the block-numbers
somewhere (and a bit-map won't do it, block nr must be ordered). Having
the inode-pointer in the process table entry (and not releasing it
before an exit), and using that to find the blocks wouldn't be too hard.
A bit slower, but conceptually much easier. Then you can use the
routines already on hand (map() etc).

Note that the "relatively easy" must be taken with a bit of salt: you
have to add the routines to the paging unit etc etc. No major problems
at least. Also I'd like to keep linux simple even at the cost of some
speed hit, as otherwise it grows until nobody really understands it. I'm
kinda proud of my mm: it's not many lines of code (although it's not
very clear code.)

Re: sticky bits, shared text...

I don't like sticky bits (in the meaning that they lock something into
memory). I doubt it's really that useful on a small machine, that is
essentially single-user. It's easy to grow the cache to 6M or more if
you have the memory, and currently I don't see much unnecessary disk I/O
in a heavy 'make'. Besides - sticky bits are hard to keep track of.

Shared text and sticky bits have another shared problem: right now
linux allows writes to the code segment. This isn't a very big problem,
as the changes to 'mm' are minimal, but you'd have to check that the
code segment is a multiple of 4096 (I /think/ I made ld do this, but I'm
not sure).

The biggest problem however is the amount of data you have to keep track
of. You'll have to add a lot of structures to know which pages are in
which executable etc. I don't think it's worth it, especially if real
paging (with a partition of it's own) is implemented.

Yes. The right way to do this is to add an f...

To: <Linux-activists@...>
Date: Monday, November 18, 1991 - 2:09 am

Shared text helps a lot with recursive commands. I'm surprised linux
doesn't already have it. Fork is most naturally done by sharing text and

The cache i/o that you don't see hurts (there should be a LED for it :).
Building the library under Minix-386-cached-text takes 25% longer when
the shell is bash. The overhead is actually for something else - copying
40K data from the cache and zeroing 200K bss. Under another version of
Minix with copy-on-access forks, building the library takes another 10%
longer. There is only slightly less copying because a lot of text gets
copied, and more overhead from page faults.

The other thing that hurts without cached text is that heavily-used programs
will be duplicated in the disk cache and as text. Perhaps mapping the disk
cache to text instead of copying it would be almost as good as managing
text separately.

You would need a printf-emulator in the kernel :-(. The floating point

The djgpp emulator (in 32-bit C) is 14+ times slower than my library
routines (in 32-bit asm) (the emulator in Turbo C++ is only 7 times
slower :-). It takes a large amount of code to the emulation compared

This doesn't explain why the slow mode worked to boot. Perhaps the fast
mode is done in software ;-). I guess the bug is really in the BIOS+linux

The error seems normal. You are lucky to get it instead of a crash. My
1987 Award BIOS and 1990 AMI BIOS can be relied on to set up the %CR0 bits
_incorrectly_ (Award always clears them, on a machine without an x87).
After booting, AMI on my 486 has set %CR0 to 0x10. The relevant bits are

0x01: MP (Math Present): should be set (WRONG)
0x02: EM (Emulation): should be clear to use x87/486
this is what gives the device not available trap -
linux must be setting it
0x04: ET (Extension Type): should be set (always set by 486 h/w so BIOS
gets this one right :)
0x08: NE (Numeric Error): 486 s/w should use this to get error reporting
...

To: <Linux-activists@...>
Date: Sunday, November 17, 1991 - 11:47 pm

Did you mean implementing a 387/487 emulator, or something specific for
the gcc soft-float routines? I was wondering what sort of speed hit you
would take (in either case) if each floating point operation required a
trap to the kernel. That's why my previous suggestion had suggested
mapping certain pages into the processes address space, so that the
calling the FP routines wouldn't require a context switch.

I was thinking, however, that another, possibly more elegant solution
would be to assign shared libraries (including the FP routines) to a
segment which would be visible to all processes. Then all the stub
routines would need to do is to do a far call to a predefined segment.
What do people think?

- Ted

P.S. Having the kernel emulating 387 instructions would still be neat;
I was just wondering if it would be too slow for normal operations.

To: <Linux-activists@...>
Date: Monday, November 18, 1991 - 7:57 am

System calls under linux never require a context switch. In fact context
switches are extremely rare: they happen ONLY when one process stops
running and another one starts. The floating point exceptions would be
slow, but that's mainly because they would have to decode the effective
addresses etc.. Not very much fun, but somewhat interesting.

Still, there are a lot of reasons to have a FP-emulator in the kernel.
If somebody wants fast floating point, he'd better get a 387, but I'd
like to be able to support all programs on all machines independent of
the 387. Currently the library is soft-float, which means that you
cannot reliably use it with a program that has been compiled with
"-m80387". Big drawback, as is the possibility that someone has a

This is the preferred solution: it's simple and easy to add to the
kernel. The routines in libc.a would just be stubs calling the "library
segment". No problem, except for the math's.

Also, I don't find the math-instructions that time-critical: they are
relatively few in most programs. If you do number-crunching, you have a
387 anyway (as they are quite inexpensive nowadays - I got one just to
be able to test the kernel routines).

Kermit has a problem with ^C. I didn't even try to fix this, as I didn't
know if it should be fixed. Anybody know? It traps it nicely, and
exists, but somehow I expected kermit to ignore ^C when in terminal
mode. Oh well, I ported it so that I could download files, and it works
for that.

It's not the lack of a 80387 (and don't get a 287, I'm not sure I can
support it... working on it). It seems more like a corrupted filesystem,
but I have been known to be wrong (sometimes ;-). I'll post the (new)
fsck with binaries to nic some time today, and they might show up
sometime. Still even more beta than the system :-).

The "general protection violation" is a general error: it happens at
most programming errors (or if you try to use minix binaries, or if the
executable file is corrupted). You can see wh...

Previous thread: Linux vs. 486/33 by John T Kohl on Saturday, November 16, 1991 - 4:03 am. (2 messages)

Next thread: alternative to paging by Peter MacDonald on Saturday, November 16, 1991 - 9:07 pm. (1 message)