* Christoph Lameter (clameter@sgi.com) wrote:Will do after doing changes according to your comments. It doesn't ? :) will fix. Yes, will change. I guess it would be more : page_address(c->page) | (c->freeoffset & c->off_mask) where c->off_mask = (PAGE_SIZE << s->order) - 1; So we don't throw away the LSBs. Yep, that would be faster. Note the "from the fastpath point of view", which means that if an interrupt nests over the cmpxchg_local fastpath and uses set_freelist_ptr, all the operations will be executed before the interrupt returns to the cmpxchg_local fastpath, therefore showing the update as being done "atomically" from the cmpxchg_local fastpath point of view. Anyway, I am changing this for : static inline void set_freelist_ptr(struct kmem_cache_cpu *c, void **freelist) { unsigned long offset; offset = c->freeoffset + c->off_mask + 1; offset &= ~c->off_mask; offset |= (unsigned long)freelist & c->off_mask; c->freeoffset = offset; } Which will insure c->freeoffset is always in a valid state, therefore allowing interrupts to nest over it. Should be ok now since we allow interrupts to nest over set_freelist_ptr. But I think this is also protected by slab_lock, taken both in the slow path and in deactivate_slab. In any case, doing a object = get_freelist_ptr(c); set_freelist_ptr(c, object[c->offset]); Like deactivate_slab does should be either protected by disabling interrupts or by the slab_lock. Am I right ? Hrm, there should be a 1 to 1 mapping between the old code and the new code : - } while (cmpxchg_local(&c->freelist, object, object[c->offset]) - != object); + /* No need to increment the MSB counter here, because only + * object free will lead to counter re-use. */ + } while (cmpxchg_local(&c->freeoffset, freeoffset, + (freeoffset + (c->offset * sizeof(void *)))) != freeoffset); With the difference that we use only offsets instead of the full object address. Ah! I think I found the issue. It should become : newoffset = freeoffset; newoffset &= ~c->off_mask; newoffset |= (unsigned long)(object[c->offset]) & c->off_mask; } while (cmpxchg_local(&c->freeoffset, freeoffset, newoffset) != freeoffset); Mathieu new code, compile-tested only : Implement slub fastpath in terms of freebase and freeoffset It allows the cmpxchg_local to detect object re-use by keeping a counter in the freeoffset MSBs. freeoffset & c->off_mask is the offset of the freelist element within the page freeoffset & ~c->off_mask is the counter to detect object re-use.. freebase is the address of the page in this the object is located. It is a multiple of c->off_mask + 1. Whenever an object is freed in the cpu slab cache, the counter is incremented. Whenever the alloc/free slow paths are modifying the freeoffset or freebase, the freeoffset counter is also incremented. It is used to make sure we know if freebase has been modified in an interrupt nested over the fast path. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: Christoph Lameter <clameter@sgi.com> --- include/linux/slub_def.h | 9 +++- mm/slub.c | 103 ++++++++++++++++++++++++++++++++++------------- 2 files changed, 83 insertions(+), 29 deletions(-) Index: linux-2.6-lttng/include/linux/slub_def.h =================================================================== --- linux-2.6-lttng.orig/include/linux/slub_def.h 2008-02-27 20:40:48.000000000 -0500 +++ linux-2.6-lttng/include/linux/slub_def.h 2008-02-28 18:19:42.000000000 -0500 @@ -32,7 +32,14 @@ enum stat_item { NR_SLUB_STAT_ITEMS }; struct kmem_cache_cpu { - void **freelist; /* Pointer to first free per cpu object */ + unsigned long freeoffset; /* + * Offset of the first free per cpu + * object. The MSBs are used as a + * counter which must be incremented + * each time an object is freed and each + * time freebase is changed. + */ + unsigned long off_mask; /* freeoffset offset LSB mask */ struct page *page; /* The slab from which we are allocating */ int node; /* The node of the page (or -1 for debug) */ unsigned int offset; /* Freepointer offset (in word units) */ Index: linux-2.6-lttng/mm/slub.c =================================================================== --- linux-2.6-lttng.orig/mm/slub.c 2008-02-27 20:41:14.000000000 -0500 +++ linux-2.6-lttng/mm/slub.c 2008-02-28 18:22:14.000000000 -0500 @@ -138,6 +138,22 @@ static inline void ClearSlabDebug(struct page->flags &= ~SLABDEBUG; } +static inline void **get_freelist_ptr(struct kmem_cache_cpu *c) +{ + return (void **)((unsigned long)page_address(c->page) + | (c->freeoffset & c->off_mask)); +} + +static inline void set_freelist_ptr(struct kmem_cache_cpu *c, void **freelist) +{ + unsigned long offset; + + offset = c->freeoffset + c->off_mask + 1; + offset &= ~c->off_mask; + offset |= (unsigned long)freelist & c->off_mask; + c->freeoffset = offset; +} + /* * Issues still to be resolved: * @@ -305,7 +321,7 @@ static inline struct kmem_cache_cpu *get * Note that SLUB relies on page_mapping returning NULL for pages with bit 0 * in the mapping set. */ -static inline int is_end(void *addr) +static inline int is_end(long addr) { return (unsigned long)addr & PAGE_MAPPING_ANON; } @@ -1411,7 +1427,7 @@ static void deactivate_slab(struct kmem_ struct page *page = c->page; int tail = 1; - if (c->freelist) + if (is_end(c->freeoffset)) stat(c, DEACTIVATE_REMOTE_FREES); /* * Merge cpu freelist into freelist. Typically we get here @@ -1422,14 +1438,14 @@ static void deactivate_slab(struct kmem_ * be called for a debug slab. Then c->freelist may contain * a dummy pointer. */ - while (unlikely(!is_end(c->freelist))) { + while (unlikely(!is_end(c->freeoffset))) { void **object; tail = 0; /* Hot objects. Put the slab first */ /* Retrieve object from cpu_freelist */ - object = c->freelist; - c->freelist = c->freelist[c->offset]; + object = get_freelist_ptr(c); + set_freelist_ptr(c, object[c->offset]); /* And put onto the regular freelist */ object[c->offset] = page->freelist; @@ -1534,7 +1550,7 @@ load_freelist: goto debug; object = c->page->freelist; - c->freelist = object[c->offset]; + set_freelist_ptr(c, object[c->offset]); c->page->inuse = s->objects; c->page->freelist = c->page->end; c->node = page_to_nid(c->page); @@ -1636,28 +1652,48 @@ static __always_inline void *slab_alloc( */ #ifdef SLUB_FASTPATH + unsigned long freeoffset, newoffset; + c = get_cpu_slab(s, raw_smp_processor_id()); do { - object = c->freelist; - if (unlikely(is_end(object) || !node_match(c, node))) { + freeoffset = c->freeoffset; + /* + * If c->page is changed, freeoffset _must_ have its + * counter incremented. + * read freeoffset before c->page (wrt interrupts). + */ + barrier(); + if (unlikely(is_end(freeoffset) || !node_match(c, node))) { object = __slab_alloc(s, gfpflags, node, addr, c); break; } + object = (void **)((unsigned long)page_address(c->page) + | (freeoffset & c->off_mask)); stat(c, ALLOC_FASTPATH); - } while (cmpxchg_local(&c->freelist, object, object[c->offset]) - != object); + /* No need to increment the MSB counter here, because only + * object free will lead to counter re-use. */ + newoffset = freeoffset; + newoffset &= ~c->off_mask; + newoffset |= (unsigned long)(object[c->offset]) & c->off_mask; + } while (cmpxchg_local(&c->freeoffset, freeoffset, newoffset) + != freeoffset); #else unsigned long flags; local_irq_save(flags); c = get_cpu_slab(s, smp_processor_id()); - if (unlikely(is_end(c->freelist) || !node_match(c, node))) + if (unlikely(is_end(c->freeoffset) || !node_match(c, node))) object = __slab_alloc(s, gfpflags, node, addr, c); else { - object = c->freelist; - c->freelist = object[c->offset]; + object = get_freelist_ptr(c); + /* + * No need to increment freeoffset of PAGE_SIZE, so we simply + * update the freeoffset LSB here. + */ + c->freeoffset &= ~c->off_mask; + c->freeoffset |= (unsigned long)object[c->offset] & c->off_mask; stat(c, ALLOC_FASTPATH); } local_irq_restore(flags); @@ -1779,21 +1815,21 @@ static __always_inline void slab_free(st struct kmem_cache_cpu *c; #ifdef SLUB_FASTPATH - void **freelist; + unsigned long freeoffset, newoffset; c = get_cpu_slab(s, raw_smp_processor_id()); debug_check_no_locks_freed(object, s->objsize); do { - freelist = c->freelist; + freeoffset = c->freeoffset; barrier(); /* * If the compiler would reorder the retrieval of c->page to - * come before c->freelist then an interrupt could - * change the cpu slab before we retrieve c->freelist. We - * could be matching on a page no longer active and put the - * object onto the freelist of the wrong slab. + * come before c->freeoffset then an interrupt could change the + * cpu slab before we retrieve c->freelist. We could be matching + * on a page no longer active and put the object onto the + * freelist of the wrong slab. * - * On the other hand: If we already have the freelist pointer + * On the other hand: If we already have the freeoffset * then any change of cpu_slab will cause the cmpxchg to fail * since the freelist pointers are unique per slab. */ @@ -1801,9 +1837,15 @@ static __always_inline void slab_free(st __slab_free(s, page, x, addr, c->offset); break; } - object[c->offset] = freelist; + object[c->offset] = + (void **)((unsigned long)page_address(c->page) | + (freeoffset & c->off_mask)); stat(c, FREE_FASTPATH); - } while (cmpxchg_local(&c->freelist, freelist, object) != freelist); + newoffset = freeoffset + PAGE_SIZE; + newoffset &= ~c->off_mask; + newoffset |= (unsigned long)object & c->off_mask; + } while (cmpxchg_local(&c->freeoffset, freeoffset, newoffset) + != freeoffset); #else unsigned long flags; @@ -1811,8 +1853,13 @@ static __always_inline void slab_free(st debug_check_no_locks_freed(object, s->objsize); c = get_cpu_slab(s, smp_processor_id()); if (likely(page == c->page && c->node >= 0)) { - object[c->offset] = c->freelist; - c->freelist = object; + object[c->offset] = get_freelist_ptr(c); + /* + * No need to update c->page here, so simply update freeoffset. + */ + c->freeoffset += c->off_mask + 1; + c->freeoffset &= ~c->off_mask; + c->freeoffset |= (unsigned long)object & c->off_mask; stat(c, FREE_FASTPATH); } else __slab_free(s, page, x, addr, c->offset); @@ -1995,7 +2042,8 @@ static void init_kmem_cache_cpu(struct k struct kmem_cache_cpu *c) { c->page = NULL; - c->freelist = (void *)PAGE_MAPPING_ANON; + c->freeoffset = PAGE_MAPPING_ANON; + c->off_mask = (PAGE_SIZE << s->order) - 1; c->node = 0; c->offset = s->offset / sizeof(void *); c->objsize = s->objsize; @@ -2042,8 +2090,7 @@ static struct kmem_cache_cpu *alloc_kmem struct kmem_cache_cpu *c = per_cpu(kmem_cache_cpu_free, cpu); if (c) - per_cpu(kmem_cache_cpu_free, cpu) = - (void *)c->freelist; + per_cpu(kmem_cache_cpu_free, cpu) = (void *)get_freelist_ptr(c); else { /* Table overflow: So allocate ourselves */ c = kmalloc_node( @@ -2064,7 +2111,7 @@ static void free_kmem_cache_cpu(struct k kfree(c); return; } - c->freelist = (void *)per_cpu(kmem_cache_cpu_free, cpu); + set_freelist_ptr(c, (void *)per_cpu(kmem_cache_cpu_free, cpu)); per_cpu(kmem_cache_cpu_free, cpu) = c; } -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 --
| Arnd Bergmann | SCHED_IDLE documentation |
| david | Re: limits on raid |
| Jan Engelhardt | Re: [PATCH] CodingStyle: multiple updates |
| Ingo Molnar | Re: Rescheduling interrupts |
git: | |
| Russ Brown | git-svn: Branching clarifications |
| Sam Song | Fwd: [OT] Re: Git via a proxy server? |
| Junio C Hamano | Re: More precise tag following |
| Pierre Habouzit | Re: People unaware of the importance of "git gc"? |
| Michael | Virtual interface |
| Stijn | Re: libiconv problem |
| Stefan Beke | mail dovecot: pipe() failed: Too many open files |
| Amaury De Ganseman | "ping: sendto: No buffer space available" when using bittorrent or another p2p |
| Jim Winstead Jr. | Re: Root Disk/Book Disk Compatibility |
| Darren Senn | Re: Elm |
| Seung-Chul Woo | Is it possible to mount GNU HURD file system as DOS in SLS? |
| David Willmore | Re: Intel, the Pentium and Linux |
