Re: [PATCH net-next-2.6] net: if6_get_next() fix

Previous thread: Re: [C/R ARM v2][PATCH 1/3] ARM: Rudimentary syscall interfaces by Roland McGrath on Wednesday, April 28, 2010 - 5:08 pm. (2 messages)

Next thread: [C/R ARM v2][PATCH 0/3] Linux Checkpoint-Restart - ARM port by Christoffer Dall on Monday, April 26, 2010 - 2:43 pm. (5 messages)
From: akpm
Date: Wednesday, April 28, 2010 - 4:53 pm

The mm-of-the-moment snapshot 2010-04-28-16-53 has been uploaded to

   http://userweb.kernel.org/~akpm/mmotm/

and will soon be available at

   git://zen-kernel.org/kernel/mmotm.git

It contains the following patches against ...
From: Andrew Morton
Date: Thursday, April 29, 2010 - 2:53 pm

On Thu, 29 Apr 2010 17:32:10 -0400

It Works For Me.  Send .config, please?
--

From: Andrew Morton
Date: Thursday, April 29, 2010 - 3:00 pm

On Thu, 29 Apr 2010 14:53:33 -0700

Is OK, I found one.  i386 .config below.

Yes, it's numa-slab-use-numa_mem_id-for-slab-local-memory-node.patch. 
I stared at it for a bit and got bored - it's Lee's turn ;)

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.34-rc5-mm1
# Thu Apr 29 14:55:57 2010
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
# CONFIG_X86_64 is not set
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
# CONFIG_NEED_DMA_MAP_STATE is not set
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
# CONFIG_GENERIC_TIME_VSYSCALL is not set
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
# CONFIG_HAVE_CPUMASK_OF_CPU_MAP is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
# CONFIG_ZONE_DMA32 is not set
CONFIG_ARCH_POPULATES_NODE_MAP=y
# CONFIG_AUDIT_ARCH is not set
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_EARLY_RES=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
CONFIG_X86_32_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_TRAMPOLINE=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx ...
From: Lee Schermerhorn
Date: Friday, April 30, 2010 - 5:11 am

Working on it.

Thought I'd preserved other archs, but i386 shares some headers and
other files with x86_64 and I apparently botched one of those.

I have a number of other reworks as well that I hope to send out later
today.

Lee

--

From: Lee Schermerhorn
Date: Friday, April 30, 2010 - 11:02 am

Vlad:

Would you have time to try a few patches?  I've been doing some rework
of the generic percpu numa_node/numa_mem patches.  This rework should
solve the build problem you encountered.

The patch you cite:
numa-slab-use-numa_mem_id-for-slab-local-memory-node.patch
is the immediate cause of the build failure, but it's really the earlier
patches--definitions of numa_node_id() and numa_mem_id()--that are
causing it. 

I have attached a tarball containing a number of incremental patches to
the numa-* patches in 22apr or 28apr mmotm.  Should be fairly obvious
from the name where they fit into the series.  The tarball includes
Andrew's fix to the previous build problem you encountered [missing
paren].  That fix is already in the 28apr mmotm.

These patches build, boot and run on x86_64 and ia64.  I also built
x86-64 mm/slab.o with !NUMA and an i386 kernel cross-compile with the
config that Andrew sent out.

I hope to post the individual patches to the mailing lists for review
later today.  The tarball should give you a head start if you're
interested.

Lee
From: Valdis.Kletnieks
Date: Friday, April 30, 2010 - 11:47 am

Not sure if I bollixed up adding the patches to the series file, that
section now looks like:

% grep -n ^numa series
1465:numa-add-generic-percpu-var-numa_node_id-implementation.patch
1466:numa-add-generic-percpu-var-numa_node_id-implementation-fix1.patch
1467:numa-add-generic-percpu-var-numa_node_id-implementation-fix2.patch
1468:numa-x86_64-use-generic-percpu-var-numa_node_id-implementation.patch
1469:numa-x86_64-use-generic-percpu-var-numa_node_id-implementation-fix1.patch
1470:numa-x86_64-use-generic-percpu-var-numa_node_id-implementation-fix2.patch
1471:numa-ia64-use-generic-percpu-var-numa_node_id-implementation.patch
1473:numa-introduce-numa_mem_id-effective-local-memory-node-id.patch
1474:numa-introduce-numa_mem_id-effective-local-memory-node-id-fix.patch
1475:numa-introduce-numa_mem_id-effective-local-memory-node-id-fix2.patch
1476:numa-introduce-numa_mem_id-effective-local-memory-node-id-fix3.patch
1477:numa-ia64-support-numa_mem_id-for-memoryless-nodes.patch
1478:numa-slab-use-numa_mem_id-for-slab-local-memory-node.patch
1479:numa-in-kernel-profiling-use-cpu_to_mem-for-per-cpu-allocations.patch
1480:numa-update-documentation-vm-numa-add-memoryless-node-info.patch
1481:numa-update-documentation-vm-numa-add-memoryless-node-info-fix1.patch

and it dies with:

 LANG=C make
  CHK     include/linux/version.h
  CHK     include/generated/utsrelease.h
  CC      arch/x86/kernel/asm-offsets.s
In file included from include/linux/gfp.h:7,
                 from include/linux/kmod.h:22,
                 from include/linux/module.h:13,
                 from include/linux/crypto.h:21,
                 from arch/x86/kernel/asm-offsets_64.c:8,
                 from arch/x86/kernel/asm-offsets.c:4:
include/linux/topology.h:246: error: redefinition of 'numa_node_id'
/usr/src/linux-2.6.34-rc5-mmotm0428/arch/x86/include/asm/topology.h:163: note: previous definition of 'numa_node_id' was here
make[1]: *** [arch/x86/kernel/asm-offsets.s] Error 1
make: *** [prepare0] Error 2

X86_64 ...
From: Lee Schermerhorn
Date: Friday, April 30, 2010 - 12:05 pm

Hmmm.  I tested that before I sent it.  I was getting that error before
I added the "override" definition to arch/x86/include/asm/topology.h.

Let me see what I did wrong...

Thanks

--

From: Valdis.Kletnieks
Date: Thursday, April 29, 2010 - 2:32 pm

Dies trying to compile mm/slab.c because something screwed the pooch with
the definition of numa_mem_id(). My bets are on
numa-slab-use-numa_mem_id-for-slab-local-memory-node.patch

  CC      mm/slab.o
mm/slab.c: In function 'kmem_cache_init':
mm/slab.c:1506: error: 'numa_node' undeclared (first use in this function)
mm/slab.c:1506: error: (Each undeclared identifier is reported only once
mm/slab.c:1506: error: for each function it appears in.)
mm/slab.c:1506: warning: type defaults to 'int' in declaration of 'pscr_ret__'
mm/slab.c:1506: warning: type defaults to 'int' in declaration of 'type name'
mm/slab.c:1506: warning: cast from pointer to integer of different size
mm/slab.c:1506: warning: type defaults to 'int' in declaration of 'pfo_ret__'
mm/slab.c:1506: warning: type defaults to 'int' in declaration of 'pfo_ret__'
mm/slab.c:1506: warning: type defaults to 'int' in declaration of 'pfo_ret__'
mm/slab.c:1506: warning: type defaults to 'int' in declaration of 'pfo_ret__'
mm/slab.c: In function 'setup_cpu_cache':
mm/slab.c:2148: error: 'numa_node' undeclared (first use in this function)
mm/slab.c:2148: warning: type defaults to 'int' in declaration of 'pscr_ret__'
mm/slab.c:2148: warning: type defaults to 'int' in declaration of 'type name'
mm/slab.c:2148: warning: cast from pointer to integer of different size
mm/slab.c:2148: warning: type defaults to 'int' in declaration of 'pfo_ret__'
mm/slab.c:2148: warning: type defaults to 'int' in declaration of 'pfo_ret__'
mm/slab.c:2148: warning: type defaults to 'int' in declaration of 'pfo_ret__'
mm/slab.c:2148: warning: type defaults to 'int' in declaration of 'pfo_ret__'
mm/slab.c: In function 'do_drain':
mm/slab.c:2506: error: 'numa_node' undeclared (first use in this function)
mm/slab.c:2506: warning: type defaults to 'int' in declaration of 'pscr_ret__'
mm/slab.c:2506: warning: type defaults to 'int' in declaration of 'type name'
mm/slab.c:2506: warning: cast from pointer to integer of different size
mm/slab.c:2506: warning: ...
From: Valdis.Kletnieks
Date: Sunday, May 2, 2010 - 10:38 am

The nouveau driver had issues during boot, caused plymouth to have indigestion
and fall back to 24x80 character mode. This happened sometime between
-mmotm0422 and -mmotm0428.  Looks like something in linux-next, cc'ing
anybody who apparently touched nouveau code.  Will bisect if nobody has
a good idea what happened.

The X server was able to init the card and get gdm started just fine.

lspci says: 01:00.0 VGA compatible controller: nVidia Corporation G98M [Quadro NVS 160M] (rev a1)

[    1.038152] [drm] nouveau 0000:01:00.0: Detected 256MiB VRAM
[    1.194505] [TTM] Zone  kernel: Available graphics memory: 2013496 kiB.
[    1.194508] [ttm] Initializing pool allocator.
[    1.194896] ------------[ cut here ]------------
[    1.194903] WARNING: at arch/x86/mm/ioremap.c:111 __ioremap_caller+0x238/0x3f4()
[    1.194906] Hardware name: Latitude E6500                  
[    1.194908] Modules linked in:
[    1.194912] Pid: 1, comm: swapper Not tainted 2.6.34-rc5-mmotm0428 #1
[    1.194915] Call Trace:
[    1.194921]  [<ffffffff81038936>] warn_slowpath_common+0x80/0x98
[    1.194925]  [<ffffffff81038963>] warn_slowpath_null+0x15/0x17
[    1.194929]  [<ffffffff8102064a>] __ioremap_caller+0x238/0x3f4
[    1.194934]  [<ffffffff812b27ba>] ? ttm_mem_reg_ioremap+0x54/0x7b
[    1.194939]  [<ffffffff810d6aff>] ? kmem_cache_alloc_notrace+0x68/0xc0
[    1.194942]  [<ffffffff8102088f>] ioremap_nocache+0x12/0x14
[    1.194946]  [<ffffffff812b27ba>] ttm_mem_reg_ioremap+0x54/0x7b
[    1.194950]  [<ffffffff812b2b5c>] ttm_bo_move_memcpy+0x5e/0x3e0
[    1.194954]  [<ffffffff812a4d7d>] ? drm_mm_split_at_start+0x79/0x95
[    1.194959]  [<ffffffff8120ec2d>] ? do_raw_spin_unlock+0xd0/0xfa
[    1.194964]  [<ffffffff812c21cf>] nouveau_bo_move+0x19e/0x222
[    1.194968]  [<ffffffff812afe79>] ttm_bo_handle_move_mem+0x1b4/0x2c0
[    1.194972]  [<ffffffff812b1d9e>] ttm_bo_move_buffer+0xf4/0x14d
[    1.194976]  [<ffffffff81210b2d>] ? __list_add+0xb7/0xd2
[    1.194980]  [<ffffffff812b1ee3>] ...
From: Valdis.Kletnieks
Date: Sunday, May 2, 2010 - 10:46 am

I thought we swatted all these, hit another one...

[    9.131490] ctnetlink v0.93: registering with nfnetlink.
[    9.131535]
[    9.131535] ===================================================
[    9.131704] [ INFO: suspicious rcu_dereference_check() usage. ]
[    9.131794] ---------------------------------------------------
[    9.131883] net/netfilter/nf_conntrack_ecache.c:88 invoked rcu_dereference_check() without protection!
[    9.131977]
[    9.131977] other info that might help us debug this:
[    9.131978]
[    9.132218]
[    9.132219] rcu_scheduler_active = 1, debug_locks = 0
[    9.132434] 1 lock held by swapper/1:
[    9.132519]  #0:  (nf_ct_ecache_mutex){+.+...}, at: [<ffffffff8148922d>] nf_conntrack_register_notifier+0x1a/0x75
[    9.132938]
[    9.132939] stack backtrace:
[    9.133129] Pid: 1, comm: swapper Tainted: G        W   2.6.34-rc5-mmotm0428 #1
[    9.133220] Call Trace:
[    9.133319]  [<ffffffff81064832>] lockdep_rcu_dereference+0xaa/0xb2
[    9.133410]  [<ffffffff81489250>] nf_conntrack_register_notifier+0x3d/0x75
[    9.133521]  [<ffffffff81b5a157>] ctnetlink_init+0x71/0xd5
[    9.133627]  [<ffffffff81b5a0e6>] ? ctnetlink_init+0x0/0xd5
[    9.133735]  [<ffffffff810001ef>] do_one_initcall+0x59/0x14e
[    9.133843]  [<ffffffff81b2e68a>] kernel_init+0x144/0x1ce
[    9.133949]  [<ffffffff81003414>] kernel_thread_helper+0x4/0x10
[    9.134060]  [<ffffffff81598a40>] ? restore_args+0x0/0x30
[    9.134196]  [<ffffffff81b2e546>] ? kernel_init+0x0/0x1ce
[    9.134328]  [<ffffffff81003410>] ? kernel_thread_helper+0x0/0x10
[    9.134530] ip_tables: (C) 2000-2006 Netfilter Core Team
[    9.134655] TCP bic registered

From: Eric Dumazet
Date: Sunday, May 2, 2010 - 10:38 pm

Thanks for the report !

We can use rcu_dereference_protected() in those cases.

[PATCH] net: Use rcu_dereference_protected in nf_conntrack_ecache

Writers own nf_ct_ecache_mutex.

Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/netfilter/nf_conntrack_ecache.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index f516961..cdcc764 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -85,7 +85,8 @@ int nf_conntrack_register_notifier(struct nf_ct_event_notifier *new)
 	struct nf_ct_event_notifier *notify;
 
 	mutex_lock(&nf_ct_ecache_mutex);
-	notify = rcu_dereference(nf_conntrack_event_cb);
+	notify = rcu_dereference_protected(nf_conntrack_event_cb,
+					   lockdep_is_held(&nf_ct_ecache_mutex));
 	if (notify != NULL) {
 		ret = -EBUSY;
 		goto out_unlock;
@@ -105,7 +106,8 @@ void nf_conntrack_unregister_notifier(struct nf_ct_event_notifier *new)
 	struct nf_ct_event_notifier *notify;
 
 	mutex_lock(&nf_ct_ecache_mutex);
-	notify = rcu_dereference(nf_conntrack_event_cb);
+	notify = rcu_dereference_protected(nf_conntrack_event_cb,
+					   lockdep_is_held(&nf_ct_ecache_mutex));
 	BUG_ON(notify != new);
 	rcu_assign_pointer(nf_conntrack_event_cb, NULL);
 	mutex_unlock(&nf_ct_ecache_mutex);
@@ -118,7 +120,8 @@ int nf_ct_expect_register_notifier(struct nf_exp_event_notifier *new)
 	struct nf_exp_event_notifier *notify;
 
 	mutex_lock(&nf_ct_ecache_mutex);
-	notify = rcu_dereference(nf_expect_event_cb);
+	notify = rcu_dereference_protected(nf_expect_event_cb,
+					   lockdep_is_held(&nf_ct_ecache_mutex));
 	if (notify != NULL) {
 		ret = -EBUSY;
 		goto out_unlock;
@@ -138,7 +141,8 @@ void nf_ct_expect_unregister_notifier(struct nf_exp_event_notifier *new)
 	struct nf_exp_event_notifier *notify;
 
 	mutex_lock(&nf_ct_ecache_mutex);
-	notify = ...
From: Eric Dumazet
Date: Sunday, May 2, 2010 - 10:41 pm

Oops scratch that, I'll resend a correct version.



--

From: Eric Dumazet
Date: Sunday, May 2, 2010 - 10:43 pm

Sorry, patch _is_ fine, I had one brain collapse when re-reading it, I
thought a different mutex was in use in one of the functions.


--

From: David Miller
Date: Sunday, May 2, 2010 - 10:55 pm

From: Eric Dumazet <eric.dumazet@gmail.com>

Ok, Patrick please review, thanks.
--

From: Patrick McHardy
Date: Monday, May 10, 2010 - 8:40 am

Actually we don't need the rcu_dereference() calls at all since
registration and unregistration are protected by the mutexes.

I queued this patch in nf-next, the only reason why I haven't
submitted it yet is that I was unable to get git to cleanly export
only the proper set of patches meant for -next due to a few merges,
it insists on including 5 patches already merged upstream. If you
don't mind ignoring the first 5 patches in the series, I'll send a
pull request tonight.


From: Eric Dumazet
Date: Monday, May 10, 2010 - 8:56 am

This will clash with upcoming RCU patches, where rcu protected pointer
cannot be directly accessed without lockdep splats.



--

From: Eric Dumazet
Date: Monday, May 10, 2010 - 8:57 am

From: Patrick McHardy
Date: Monday, May 10, 2010 - 9:03 am

Thanks for the information, I didn't realize that when looking at
those patches. So I guess the correct fix once those patches are
merged would be to use rcu_assign_protected() and rcu_access_pointer().
--

From: Patrick McHardy
Date: Monday, May 10, 2010 - 9:04 am

Ah, and that's what you did. Sorry for the confusion :)
--

From: Paul E. McKenney
Date: Monday, May 10, 2010 - 8:57 am

The best approach in that case is rcu_dereference_protected() listing
the lock that must be held.  Of course, your code, so your choice.


--

From: David Miller
Date: Monday, May 10, 2010 - 9:12 am

From: Patrick McHardy <kaber@trash.net>

Something like "git format-patch origin" doesn't avoid those upstream
commits?  Weird...

Another trick is to specify a commit range using triple-dot "..."
notation, such as "origin...master"
--

From: Patrick McHardy
Date: Monday, May 10, 2010 - 9:27 am

Yeah, it seems to have something to do with me merging the nf-2.6.git
tree a few weeks ago since it had patches queued that were too late

Thanks, I'll give it another try, the alternative is manually
renumbering the entire patchset.
--

From: Valdis.Kletnieks
Date: Monday, May 3, 2010 - 7:30 am

I *really* thought we swatted a bunch of these - did the fixes not make it
into linux-next or -mm?  Your patch fixed that one, but then:

[    9.128899] Netfilter messages via NETLINK v0.30.
[    9.128919] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[    9.129108] CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use
[    9.129110] nf_conntrack.acct=1 kernel parameter, acct=1 nf_conntrack module option or
[    9.129113] sysctl net.netfilter.nf_conntrack_acct=1 to enable it.
[    9.129135] ctnetlink v0.93: registering with nfnetlink.
[    9.129452] ip_tables: (C) 2000-2006 Netfilter Core Team
[    9.129506] 
[    9.129507] ===================================================
[    9.129683] [ INFO: suspicious rcu_dereference_check() usage. ]
[    9.129777] ---------------------------------------------------
[    9.129872] net/netfilter/nf_log.c:55 invoked rcu_dereference_check() without protection!
[    9.129969] 
[    9.129969] other info that might help us debug this:
[    9.129970] 
[    9.130232] 
[    9.130232] rcu_scheduler_active = 1, debug_locks = 0
[    9.130407] 1 lock held by swapper/1:
[    9.130525]  #0:  (nf_log_mutex){+.+...}, at: [<ffffffff81481154>] nf_log_register+0x57/0x10f
[    9.130955] 
[    9.130956] stack backtrace:
[    9.131162] Pid: 1, comm: swapper Tainted: G        W   2.6.34-rc5-mmotm0428 #2
[    9.131259] Call Trace:
[    9.131370]  [<ffffffff81064832>] lockdep_rcu_dereference+0xaa/0xb2
[    9.131466]  [<ffffffff814811db>] nf_log_register+0xde/0x10f
[    9.131579]  [<ffffffff81b5ca28>] ? log_tg_init+0x0/0x29
[    9.131689]  [<ffffffff81b5ca4d>] log_tg_init+0x25/0x29
[    9.131800]  [<ffffffff810001ef>] do_one_initcall+0x59/0x14e
[    9.131912]  [<ffffffff81b2e68a>] kernel_init+0x144/0x1ce
[    9.132033]  [<ffffffff81003414>] kernel_thread_helper+0x4/0x10
[    9.132146]  [<ffffffff81598a40>] ? restore_args+0x0/0x30
[    9.132257]  [<ffffffff81b2e546>] ? kernel_init+0x0/0x1ce
[    9.132370]  [<ffffffff81003410>] ? ...
From: Eric Dumazet
Date: Monday, May 3, 2010 - 7:48 am

You probably know this PROVE_RCU thing is new and reserved to
developpers ?

We yet have to change all spots were a rcu_dereference() was used
without rcu_read_lock(). Not a bug by itself, just lockdep is to be
instructed not to shout.

Maybe 30 patches already in, and maybe 30 other are still needed.



--

From: Eric Dumazet
Date: Monday, May 3, 2010 - 7:58 am

Thanks for the report !

[PATCH] net: nf_log RCU fixes

nf_log_register() and nf_log_unregister() use a mutex to have exclusive
access to nf_logers[]. Use appropriate rcu_dereference_protected()
lockdep annotation.

Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c
index 015725a..7df37fd 100644
--- a/net/netfilter/nf_log.c
+++ b/net/netfilter/nf_log.c
@@ -52,7 +52,8 @@ int nf_log_register(u_int8_t pf, struct nf_logger *logger)
 	} else {
 		/* register at end of list to honor first register win */
 		list_add_tail(&logger->list[pf], &nf_loggers_l[pf]);
-		llog = rcu_dereference(nf_loggers[pf]);
+		llog = rcu_dereference_protected(nf_loggers[pf],
+						 lockdep_is_held(&nf_log_mutex));
 		if (llog == NULL)
 			rcu_assign_pointer(nf_loggers[pf], logger);
 	}
@@ -70,7 +71,8 @@ void nf_log_unregister(struct nf_logger *logger)
 
 	mutex_lock(&nf_log_mutex);
 	for (i = 0; i < ARRAY_SIZE(nf_loggers); i++) {
-		c_logger = rcu_dereference(nf_loggers[i]);
+		c_logger = rcu_dereference_protected(nf_loggers[i],
+						     lockdep_is_held(&nf_log_mutex));
 		if (c_logger == logger)
 			rcu_assign_pointer(nf_loggers[i], NULL);
 		list_del(&logger->list[i]);


--

From: Valdis.Kletnieks
Date: Monday, May 3, 2010 - 8:29 am

Confirming that one fixed.  Now it lives a whole 36 seconds before whinging:

[   35.328729] ===================================================
[   35.328803] [ INFO: suspicious rcu_dereference_check() usage. ]
[   35.328837] ---------------------------------------------------
[   35.328872] net/ipv6/addrconf.c:2977 invoked rcu_dereference_check() without protection!
[   35.328926] 
[   35.328927] other info that might help us debug this:
[   35.328928] 
[   35.329016] 
[   35.329016] rcu_scheduler_active = 1, debug_locks = 0
[   35.329089] 2 locks held by ifconfig/2680:
[   35.329120]  #0:  (&p->lock){+.+.+.}, at: [<ffffffff810f5db8>] seq_read+0x3a/0x42d
[   35.329217]  #1:  (rcu_read_lock_bh){.+....}, at: [<ffffffff814eec68>] rcu_read_lock_bh+0x0/0x35
[   35.329322] 
[   35.329323] stack backtrace:
[   35.329380] Pid: 2680, comm: ifconfig Tainted: G        W   2.6.34-rc5-mmotm0428 #3
[   35.329439] Call Trace:
[   35.329471]  [<ffffffff81064832>] lockdep_rcu_dereference+0xaa/0xb2
[   35.329514]  [<ffffffff814ef3ae>] if6_get_next+0x34/0x6d
[   35.329554]  [<ffffffff814ef3f8>] if6_seq_next+0x11/0x18
[   35.329595]  [<ffffffff810f6083>] seq_read+0x305/0x42d
[   35.329635]  [<ffffffff810f5d7e>] ? seq_read+0x0/0x42d
[   35.329676]  [<ffffffff81129e8c>] proc_reg_read+0x8d/0xac
[   35.329717]  [<ffffffff810db7e0>] vfs_read+0xe0/0x140
[   35.329758]  [<ffffffff810db8f6>] sys_read+0x45/0x69
[   35.329799]  [<ffffffff810025eb>] system_call_fastpath+0x16/0x1b

Maybe I need to go and stick the "RCU whinge multiple times" patch on this
kernel and get it over with. :)

From: Paul E. McKenney
Date: Monday, May 3, 2010 - 8:43 am

Highly recommended.  ;-)

And thanks to you for your testing efforts and to Eric for the fixes!!!

							Thanx, Paul
--

From: Eric Dumazet
Date: Monday, May 3, 2010 - 9:14 am

For this last one, I think you should push following patch Paul

Followup of commit 3120438ad6
(rcu: Disable lockdep checking in RCU list-traversal primitives)

Or we might introduce a hlist_for_each_entry_continue_rcu_bh() macro...



diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 004908b..b0c7e24 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -435,10 +435,10 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
  * @member:	the name of the hlist_node within the struct.
  */
 #define hlist_for_each_entry_continue_rcu(tpos, pos, member)		\
-	for (pos = rcu_dereference((pos)->next);			\
+	for (pos = rcu_dereference_raw((pos)->next);			\
 	     pos && ({ prefetch(pos->next); 1; }) &&			\
 	     ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });  \
-	     pos = rcu_dereference(pos->next))
+	     pos = rcu_dereference_raw(pos->next))
 
 
 #endif	/* __KERNEL__ */


--

From: Paul E. McKenney
Date: Monday, May 3, 2010 - 11:16 am

I would be happy to if I could find the commit creating
hlist_for_each_entry_continue_rcu()...

I do see a ca. 2008 patch from Stephen Hemminger:

	http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg264661.html

According to http://patchwork.ozlabs.org/patch/47997/, this is
going up the networking tree as of March 18, 2010.

So I would be happy to push the patch below, but to do so, I will need
to adopt the portion of Stephen's patch that created this primitive.

Please let me know how you would like to proceed!

--

From: Eric Dumazet
Date: Monday, May 3, 2010 - 12:46 pm

Hmm,  I realize there is a true bug introduced by Stephen patch

Then, net-next-2.6 doesnt yet have your commit Paul to relax
hlist_for_each_entry_rcu(), so its a bit difficult to continue the work.

Thanks

[PATCH net-next-2.6] net: if6_get_next() fix

Must use rcu variant, we are in a rcu_read_lock_bh() section

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 34d2d64..16bb85c 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2979,7 +2979,7 @@ static struct inet6_ifaddr *if6_get_next(struct seq_file *seq,
 			return ifa;
 
 	while (++state->bucket < IN6_ADDR_HSIZE) {
-		hlist_for_each_entry(ifa, n,
+		hlist_for_each_entry_rcu(ifa, n,
 				     &inet6_addr_lst[state->bucket], addr_lst) {
 			if (net_eq(dev_net(ifa->idev->dev), net))
 				return ifa;


--

From: David Miller
Date: Monday, May 3, 2010 - 1:09 pm

From: Eric Dumazet <eric.dumazet@gmail.com>

Is that in Linus's tree yet?  If it propagates there I can make it
propagate to net-next-2.6, you just have to tell me you need it.
--

From: Eric Dumazet
Date: Monday, May 3, 2010 - 1:13 pm

Hmm, it seems it's already in net-next-2.6, sorry for the confusion.

commit 3120438ad68601f341e61e7cb1323b0e1a6ca367
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Mon Feb 22 17:04:48 2010 -0800

    rcu: Disable lockdep checking in RCU list-traversal primitives
    
    The theory is that use of bare rcu_dereference() is more prone
    to error than use of the RCU list-traversal primitives.
    Therefore, disable lockdep RCU read-side critical-section
    checking in these primitives for the time being.  Once all of
    the rcu_dereference() uses have been dealt with, it may be time
    to re-enable lockdep checking for the RCU list-traversal
    primitives.
    


--

From: David Miller
Date: Monday, May 3, 2010 - 1:24 pm

From: Eric Dumazet <eric.dumazet@gmail.com>

No problem, just let me know in the future if something upstream needs
to propagate.
--

From: Eric Dumazet
Date: Monday, May 3, 2010 - 1:50 pm

Paul, David, here the patch I was thinking about :

Feel free to split it in two parts if you like, I am too tired and must
sleep now ;)

Thanks

[PATCH net-next-2.6] net: rcu fixes

Add hlist_for_each_entry_rcu_bh() and
hlist_for_each_entry_continue_rcu_bh() macros, and use them in
ipv6_get_ifaddr(), if6_get_first() and if6_get_next() to fix lockdeps
warnings.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/rculist.h |   29 +++++++++++++++++++++++++++++
 net/ipv6/addrconf.c     |   16 ++++++++--------
 2 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 004908b..4ec3b38 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -429,6 +429,23 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
 		pos = rcu_dereference_raw(pos->next))
 
 /**
+ * hlist_for_each_entry_rcu_bh - iterate over rcu list of given type
+ * @tpos:	the type * to use as a loop cursor.
+ * @pos:	the &struct hlist_node to use as a loop cursor.
+ * @head:	the head for your list.
+ * @member:	the name of the hlist_node within the struct.
+ *
+ * This list-traversal primitive may safely run concurrently with
+ * the _rcu list-mutation primitives such as hlist_add_head_rcu()
+ * as long as the traversal is guarded by rcu_read_lock().
+ */
+#define hlist_for_each_entry_rcu_bh(tpos, pos, head, member)		 \
+	for (pos = rcu_dereference_bh((head)->first);			 \
+		pos && ({ prefetch(pos->next); 1; }) &&			 \
+		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); \
+		pos = rcu_dereference_bh(pos->next))
+
+/**
  * hlist_for_each_entry_continue_rcu - iterate over a hlist continuing after current point
  * @tpos:	the type * to use as a loop cursor.
  * @pos:	the &struct hlist_node to use as a loop cursor.
@@ -440,6 +457,18 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
 	     ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });  \
 	     pos = ...
From: David Miller
Date: Monday, May 3, 2010 - 3:17 pm

From: Eric Dumazet <eric.dumazet@gmail.com>

Paul, let me know if you want to handle these seperately (one commit
in your tree for the rculist.h bit and one for the ipv6 change) or to
put it all at once into net-next-2.6, I'm happy either way.
--

From: Paul E. McKenney
Date: Monday, May 3, 2010 - 3:48 pm

These changes look pretty closely integrated, so it is probably better
if they go up your tree with the related networking changes.  I will
take a look at them.

							Thanx, Paul
--

From: Paul E. McKenney
Date: Monday, May 3, 2010 - 3:52 pm

Looks good!!!

It will collide with Arnd's sparse-based changes, but that will be
easy to fix, so no problem.

--

From: David Miller
Date: Monday, May 3, 2010 - 3:54 pm

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Applied, thanks!
--

From: Patrick McHardy
Date: Monday, May 10, 2010 - 9:53 am

I've committed this patch to my tree, which also fixes up the nf_log
changes I already had queued.

I've also figured out how to prevent the false commits from showing
up using the '^' notation, I'll submit everything after some final
testing.


Previous thread: Re: [C/R ARM v2][PATCH 1/3] ARM: Rudimentary syscall interfaces by Roland McGrath on Wednesday, April 28, 2010 - 5:08 pm. (2 messages)

Next thread: [C/R ARM v2][PATCH 0/3] Linux Checkpoint-Restart - ARM port by Christoffer Dall on Monday, April 26, 2010 - 2:43 pm. (5 messages)