Re: Is module refcounting racy?

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Nick Piggin
Date: Thursday, April 1, 2010 - 1:09 am

On Wed, Mar 31, 2010 at 02:14:49PM +1030, Rusty Russell wrote:

Well maybe true (and I'm sure we have more important and frequently hit
bugs), but if we offer a feature it should be without bugs IMO.



Sure.



I think it can be done racelessly with my patch, which is not really too
much overhead. I think if this is considered too much, then we should
either fix code and preferably de-export and remove module_refcount from
drivers, or remove module removal completely.

--
Module refcounting is implemented with a per-cpu counter for speed. However
there is a race when tallying the counter where a reference may be taken
by one CPU and released by another. Reference count summation may then see
the decrement without having seen the previous increment, leading to lower
than expected count. A module which never has its actual reference drop below
1 may return a reference count of 0 due to this race.

Module removal generally runs under stop_machine, which prevents this race
causing bugs due to removal of in-use modules. However there are other real
bugs in module.c code and driver code (module_refcount is exported) where the
callers do not run under stop_machine.

Fix this by maintaining running per-cpu counters for the number of module
refcount increments and the number of refcount decrements. The increments
are tallied after the decrements, so any decrement seen will always have its
corresponding increment counted. The final refcount is the difference of the
total increments and decrements, preventing a low-refcount from being
returned.

Signed-off-by: Nick Piggin <npiggin@suse.de>
---
Index: linux-2.6/include/linux/module.h
===================================================================
--- linux-2.6.orig/include/linux/module.h
+++ linux-2.6/include/linux/module.h
@@ -365,7 +365,8 @@ struct module
 	void (*exit)(void);
 
 	struct module_ref {
-		int count;
+		unsigned int incs;
+		unsigned int decs;
 	} __percpu *refptr;
 #endif
 
@@ -459,9 +460,9 @@ static inline void __module_get(struct m
 {
 	if (module) {
 		preempt_disable();
-		__this_cpu_inc(module->refptr->count);
+		__this_cpu_inc(module->refptr->incs);
 		trace_module_get(module, _THIS_IP_,
-				 __this_cpu_read(module->refptr->count));
+				 __this_cpu_read(module->refptr->incs));
 		preempt_enable();
 	}
 }
@@ -474,11 +475,10 @@ static inline int try_module_get(struct
 		preempt_disable();
 
 		if (likely(module_is_live(module))) {
-			__this_cpu_inc(module->refptr->count);
+			__this_cpu_inc(module->refptr->incs);
 			trace_module_get(module, _THIS_IP_,
-				__this_cpu_read(module->refptr->count));
-		}
-		else
+				__this_cpu_read(module->refptr->incs));
+		} else
 			ret = 0;
 
 		preempt_enable();
Index: linux-2.6/kernel/module.c
===================================================================
--- linux-2.6.orig/kernel/module.c
+++ linux-2.6/kernel/module.c
@@ -473,11 +473,13 @@ static void module_unload_init(struct mo
 	int cpu;
 
 	INIT_LIST_HEAD(&mod->modules_which_use_me);
-	for_each_possible_cpu(cpu)
-		per_cpu_ptr(mod->refptr, cpu)->count = 0;
+	for_each_possible_cpu(cpu) {
+		per_cpu_ptr(mod->refptr, cpu)->incs = 0;
+		per_cpu_ptr(mod->refptr, cpu)->decs = 0;
+	}
 
 	/* Hold reference count during initialization. */
-	__this_cpu_write(mod->refptr->count, 1);
+	__this_cpu_write(mod->refptr->incs, 1);
 	/* Backwards compatibility macros put refcount during init. */
 	mod->waiter = current;
 }
@@ -616,12 +618,28 @@ static int try_stop_module(struct module
 
 unsigned int module_refcount(struct module *mod)
 {
-	unsigned int total = 0;
+	unsigned int incs = 0, decs = 0;
 	int cpu;
 
 	for_each_possible_cpu(cpu)
-		total += per_cpu_ptr(mod->refptr, cpu)->count;
-	return total;
+		decs += per_cpu_ptr(mod->refptr, cpu)->decs;
+	/*
+	 * ensure the incs are added up after the decs.
+	 * module_put ensures incs are visible before decs with smp_wmb.
+	 *
+	 * This 2-count scheme avoids the situation where the refcount
+	 * for CPU0 is read, then CPU0 increments the module refcount,
+	 * then CPU1 drops that refcount, then the refcount for CPU1 is
+	 * read. We would record a decrement but not its corresponding
+	 * increment so we would see a low count (disaster).
+	 *
+	 * Rare situation? But module_refcount can be preempted, and we
+	 * might be tallying up 4096+ CPUs. So it is not impossible.
+	 */
+	smp_rmb();
+	for_each_possible_cpu(cpu)
+		incs += per_cpu_ptr(mod->refptr, cpu)->incs;
+	return incs - decs;
 }
 EXPORT_SYMBOL(module_refcount);
 
@@ -798,10 +816,11 @@ void module_put(struct module *module)
 {
 	if (module) {
 		preempt_disable();
-		__this_cpu_dec(module->refptr->count);
+		smp_wmb(); /* see comment in module_refcount */
+		__this_cpu_inc(module->refptr->decs);
 
 		trace_module_put(module, _RET_IP_,
-				 __this_cpu_read(module->refptr->count));
+				 __this_cpu_read(module->refptr->decs));
 		/* Maybe they're waiting for us to drop reference? */
 		if (unlikely(!module_is_live(module)))
 			wake_up_process(module->waiter);
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Is module refcounting racy?, Nick Piggin, (Thu Mar 18, 3:55 am)
Re: Is module refcounting racy?, Rusty Russell, (Mon Mar 29, 2:12 am)
Re: Is module refcounting racy?, Nick Piggin, (Mon Mar 29, 9:58 am)
Re: Is module refcounting racy?, Rusty Russell, (Tue Mar 30, 8:44 pm)
Re: Is module refcounting racy?, Nick Piggin, (Thu Apr 1, 1:09 am)
Re: Is module refcounting racy?, Linus Torvalds, (Thu Apr 1, 8:55 am)
Re: Is module refcounting racy?, Rusty Russell, (Mon Apr 5, 7:39 pm)
Re: Is module refcounting racy?, Nick Piggin, (Mon Apr 5, 10:05 pm)
Re: Is module refcounting racy?, Eric Dumazet, (Mon Apr 5, 11:19 pm)
Re: Is module refcounting racy?, Nick Piggin, (Tue Apr 6, 12:38 am)