In an attempt to understand the numbers found when viewing '/proc/slabinfo' on the 2.4 kernel [forum], Doug Ledford read through the source code. He then posted a summary to the lkml, sharing what he'd learned. He says, "I wrote this by looking at the 2.4 kernel sources, so it may not be totally accurate as far as 2.6 is concerned, but since there is no documentation on it at the moment, it's certainly better than what's there." Read on for Doug's full explanation.
Those interested in slabinfo will also want to check out the recently released slabtop utility [story], which provides detailed, real time kernel slab cache information.
From: Doug Ledford [email blocked]
Subject: New: Documentation/vm/slabinfo.txt
Date: Sat, 01 Nov 2003 07:01:09 -0500
I wrote this by looking at the 2.4 kernel sources, so it may not be
totally accurate as far as 2.6 is concerned, but since there is no
documentation on it at the moment, it's certainly better than what's
there.
------Snip--------
> cat /proc/slabinfo:
> inode_cache 423370 423556 512 60482 60508 1 : 248 62
> dentry_cache 435756 436260 128 14526 14542 1 : 504 126
Hmmm...those lines looked suspicious, but I didn't really know how to
read the slabinfo output. So I just went and read up on it. For those
people like me that didn't know what all these numbers mean, here's my
understanding after reading the source:
active-objects
| allocated-objects
| | object-size
| | | active-slab-allocations
| | | | total-slab-allocations
| | | | | alloc-size
| | | | | |
inode_cache 423370 423556 512 60482 60508 1 : 248 62
| |
limit |
batch-count
active-objects: after creating a slab cache, you allocate your objects
out of that slab cache. This is the count of objects you currently have
allocated out of the cache.
allocated-objects: this is the current total number of objects in the
cache.
object-size: this is the size of each allocated object. There is
overhead to maintaining the cache, so with a 512byte object and a
4096byte page size, you could fit 7 objects in a single page and you
would waste 512-slab_overhead bytes per allocation. Slab overhead
varies with object size (smaller objects have more objects per
allocation and require more overhead to track used vs. unused objects).
You can determine how many objects are being put on each allocation
chunk by dividing allocated-objects by total-allocations.
active-slab-allocs: This is the number of allocations that have at least
one of that allocations objects in use.
total-slab-allocs: The total number of allocations in the current slab
cache.
alloc-size: This is the size of each allocation in units of memory
pages. Page size is architecture specific, but the most common size is
4k. A couple architectures have an 8k page size, and ia64 can do a 16k
page size. Each allocation for the cache is alloc-size * arch_page_size
bytes at a time, and total memory used by this particular slab cache is
total_slab_allocs * alloc_size * arch_page_size.
The last 2 items are SMP specific and don't show up at all on UP
kernels. On SMP machines, the slab cache will keep a per CPU cache of
objects so that an object freed on CPU0 will be reused on CPU0 instead
of CPU1 if possible. This improves cache performance on SMP systems
greatly.
limit: This is the limit on the number of free objects that can be
stored in the per-CPU free list for this slab cache.
batch-count: On SMP systems, when we refill the available object list,
instead of doing one object at a time, we do batch-count objects at a
time.
One last thing, if slab statistics are enabled, then you'll get more
numbers on each line and the lines will look like this:
UP
name active-objects total-objects object-size active-allocs total-allocs
alloc-size: high-size num-allocations times-grown allocs-reaped errors
SMP
name active-objects total-objects object-size active-allocs total-allocs
alloc-size: high-size num-allocations times-grown allocs-reaped errors:
limit batch-count: alloc-hits alloc-misses free-hits free-misses
For the most part, the statistics numbers are pretty self explanatory,
so I won't bother with them.
HTH
--
Doug Ledford [email blocked] 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
documentation
After reading this, I still have no idea what a 'slab cache' is or what relevence it has to the system. What are these 'objects' and why should I care at all about their statistics?
Here's my take on it, but I d
Here's my take on it, but I don't know about kernel internals, so this is all pure speculation :
The kernel needs to store temporarily data from various operations it performs on certain system objects (filesystem, network, process, memory...), for instance filesystem i-nodes, file blocks, network packets, etc. I think the slab is a limited memory wad used to store these "objects", so that the kernel doesn't need to retrieve the data from the device each time it need to consult it.
Perhaps it is even the main kernel memory pool.
The slab info seems to reflect these ideas by providing statistics about such objects : I can spot network (nfs*, tcp*, ip*, arp*, sock) and filesystem (devfsd, file_lock_cache, journal_head, inode_cache) and other kernel statistics in my slabinfo output.
it's a cache
When you have list of data you want to store, in user space, traditionally you'd do a malloc for each item and hook the data into a list of the items you are storing.
However, this is highly inefficient in a number of contexts, especially if the data object is smaller or they are coming and going quickly. You end up spending a lot of time malloc'ing and free'ing, and this could possible fragment memory, or worse, fragment the tracking of free and used memory. This can have a huge impact on performance.
An alternative is to instead of allocating one data item at a time, is to allocate a big chunk of them at one time, as an array, then keep a flag that indicates that if each one is in use or not. Allocating one is now as simple as finding the first free one. Free'ing is as simple as marking the free'd entry as free. Obviously you run into problems if you need more entries than you originally allocated for, but that's usually a much rarer occurance, so doing something slightly more inefficient there is not a big deal.
From what I understand, the slab_cache in the Linux kernel is based on this idea: only instead of allocating a "big chunk" of whatever it is they are allocating, they allocate page_size chunks. They also keep caches of general sized caches:
size-32
size-32(DMA)
size-64
size-64(DMA)
size-96
size-96(DMA)
size-128
size-128(DMA)
size-192
size-192(DMA)
This way, if some object in the kernel knows it needs an array of say 60 byte objects, it can allocate them from the size-64 slab (or size-64(DMA) if they need DMA access to them).
There's a paper on this that some guy did for a class. Get the PS version for the pictures. This references an original USENIX paper that was likely the source of Linux kernel implementation.
Link you posted for reference
Hey Sir,
Thanks for the link, but it seems that they have 403'ed
those links now.
Does anybody have a copy that can post it somewhere for
download?
Thanks
Link to paper
See http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.26.9588