The mm-of-the-moment snapshot 2010-04-28-16-53 has been uploaded to http://userweb.kernel.org/~akpm/mmotm/ and will soon be available at git://zen-kernel.org/kernel/mmotm.git It contains the following patches against ...
On Thu, 29 Apr 2010 17:32:10 -0400 It Works For Me. Send .config, please? --
On Thu, 29 Apr 2010 14:53:33 -0700 Is OK, I found one. i386 .config below. Yes, it's numa-slab-use-numa_mem_id-for-slab-local-memory-node.patch. I stared at it for a bit and got bored - it's Lee's turn ;) # # Automatically generated make config: don't edit # Linux kernel version: 2.6.34-rc5-mm1 # Thu Apr 29 14:55:57 2010 # # CONFIG_64BIT is not set CONFIG_X86_32=y # CONFIG_X86_64 is not set CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf32-i386" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_ZONE_DMA=y # CONFIG_NEED_DMA_MAP_STATE is not set CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y # CONFIG_RWSEM_GENERIC_SPINLOCK is not set CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y CONFIG_GENERIC_CALIBRATE_DELAY=y # CONFIG_GENERIC_TIME_VSYSCALL is not set CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_DEFAULT_IDLE=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y # CONFIG_HAVE_CPUMASK_OF_CPU_MAP is not set CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y # CONFIG_ZONE_DMA32 is not set CONFIG_ARCH_POPULATES_NODE_MAP=y # CONFIG_AUDIT_ARCH is not set CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_HAVE_EARLY_RES=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_USE_GENERIC_SMP_HELPERS=y CONFIG_X86_32_SMP=y CONFIG_X86_HT=y CONFIG_X86_TRAMPOLINE=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx ...
Working on it. Thought I'd preserved other archs, but i386 shares some headers and other files with x86_64 and I apparently botched one of those. I have a number of other reworks as well that I hope to send out later today. Lee --
Vlad: Would you have time to try a few patches? I've been doing some rework of the generic percpu numa_node/numa_mem patches. This rework should solve the build problem you encountered. The patch you cite: numa-slab-use-numa_mem_id-for-slab-local-memory-node.patch is the immediate cause of the build failure, but it's really the earlier patches--definitions of numa_node_id() and numa_mem_id()--that are causing it. I have attached a tarball containing a number of incremental patches to the numa-* patches in 22apr or 28apr mmotm. Should be fairly obvious from the name where they fit into the series. The tarball includes Andrew's fix to the previous build problem you encountered [missing paren]. That fix is already in the 28apr mmotm. These patches build, boot and run on x86_64 and ia64. I also built x86-64 mm/slab.o with !NUMA and an i386 kernel cross-compile with the config that Andrew sent out. I hope to post the individual patches to the mailing lists for review later today. The tarball should give you a head start if you're interested. Lee
Not sure if I bollixed up adding the patches to the series file, that
section now looks like:
% grep -n ^numa series
1465:numa-add-generic-percpu-var-numa_node_id-implementation.patch
1466:numa-add-generic-percpu-var-numa_node_id-implementation-fix1.patch
1467:numa-add-generic-percpu-var-numa_node_id-implementation-fix2.patch
1468:numa-x86_64-use-generic-percpu-var-numa_node_id-implementation.patch
1469:numa-x86_64-use-generic-percpu-var-numa_node_id-implementation-fix1.patch
1470:numa-x86_64-use-generic-percpu-var-numa_node_id-implementation-fix2.patch
1471:numa-ia64-use-generic-percpu-var-numa_node_id-implementation.patch
1473:numa-introduce-numa_mem_id-effective-local-memory-node-id.patch
1474:numa-introduce-numa_mem_id-effective-local-memory-node-id-fix.patch
1475:numa-introduce-numa_mem_id-effective-local-memory-node-id-fix2.patch
1476:numa-introduce-numa_mem_id-effective-local-memory-node-id-fix3.patch
1477:numa-ia64-support-numa_mem_id-for-memoryless-nodes.patch
1478:numa-slab-use-numa_mem_id-for-slab-local-memory-node.patch
1479:numa-in-kernel-profiling-use-cpu_to_mem-for-per-cpu-allocations.patch
1480:numa-update-documentation-vm-numa-add-memoryless-node-info.patch
1481:numa-update-documentation-vm-numa-add-memoryless-node-info-fix1.patch
and it dies with:
LANG=C make
CHK include/linux/version.h
CHK include/generated/utsrelease.h
CC arch/x86/kernel/asm-offsets.s
In file included from include/linux/gfp.h:7,
from include/linux/kmod.h:22,
from include/linux/module.h:13,
from include/linux/crypto.h:21,
from arch/x86/kernel/asm-offsets_64.c:8,
from arch/x86/kernel/asm-offsets.c:4:
include/linux/topology.h:246: error: redefinition of 'numa_node_id'
/usr/src/linux-2.6.34-rc5-mmotm0428/arch/x86/include/asm/topology.h:163: note: previous definition of 'numa_node_id' was here
make[1]: *** [arch/x86/kernel/asm-offsets.s] Error 1
make: *** [prepare0] Error 2
X86_64 ...Hmmm. I tested that before I sent it. I was getting that error before I added the "override" definition to arch/x86/include/asm/topology.h. Let me see what I did wrong... Thanks --
Dies trying to compile mm/slab.c because something screwed the pooch with the definition of numa_mem_id(). My bets are on numa-slab-use-numa_mem_id-for-slab-local-memory-node.patch CC mm/slab.o mm/slab.c: In function 'kmem_cache_init': mm/slab.c:1506: error: 'numa_node' undeclared (first use in this function) mm/slab.c:1506: error: (Each undeclared identifier is reported only once mm/slab.c:1506: error: for each function it appears in.) mm/slab.c:1506: warning: type defaults to 'int' in declaration of 'pscr_ret__' mm/slab.c:1506: warning: type defaults to 'int' in declaration of 'type name' mm/slab.c:1506: warning: cast from pointer to integer of different size mm/slab.c:1506: warning: type defaults to 'int' in declaration of 'pfo_ret__' mm/slab.c:1506: warning: type defaults to 'int' in declaration of 'pfo_ret__' mm/slab.c:1506: warning: type defaults to 'int' in declaration of 'pfo_ret__' mm/slab.c:1506: warning: type defaults to 'int' in declaration of 'pfo_ret__' mm/slab.c: In function 'setup_cpu_cache': mm/slab.c:2148: error: 'numa_node' undeclared (first use in this function) mm/slab.c:2148: warning: type defaults to 'int' in declaration of 'pscr_ret__' mm/slab.c:2148: warning: type defaults to 'int' in declaration of 'type name' mm/slab.c:2148: warning: cast from pointer to integer of different size mm/slab.c:2148: warning: type defaults to 'int' in declaration of 'pfo_ret__' mm/slab.c:2148: warning: type defaults to 'int' in declaration of 'pfo_ret__' mm/slab.c:2148: warning: type defaults to 'int' in declaration of 'pfo_ret__' mm/slab.c:2148: warning: type defaults to 'int' in declaration of 'pfo_ret__' mm/slab.c: In function 'do_drain': mm/slab.c:2506: error: 'numa_node' undeclared (first use in this function) mm/slab.c:2506: warning: type defaults to 'int' in declaration of 'pscr_ret__' mm/slab.c:2506: warning: type defaults to 'int' in declaration of 'type name' mm/slab.c:2506: warning: cast from pointer to integer of different size mm/slab.c:2506: warning: ...
The nouveau driver had issues during boot, caused plymouth to have indigestion and fall back to 24x80 character mode. This happened sometime between -mmotm0422 and -mmotm0428. Looks like something in linux-next, cc'ing anybody who apparently touched nouveau code. Will bisect if nobody has a good idea what happened. The X server was able to init the card and get gdm started just fine. lspci says: 01:00.0 VGA compatible controller: nVidia Corporation G98M [Quadro NVS 160M] (rev a1) [ 1.038152] [drm] nouveau 0000:01:00.0: Detected 256MiB VRAM [ 1.194505] [TTM] Zone kernel: Available graphics memory: 2013496 kiB. [ 1.194508] [ttm] Initializing pool allocator. [ 1.194896] ------------[ cut here ]------------ [ 1.194903] WARNING: at arch/x86/mm/ioremap.c:111 __ioremap_caller+0x238/0x3f4() [ 1.194906] Hardware name: Latitude E6500 [ 1.194908] Modules linked in: [ 1.194912] Pid: 1, comm: swapper Not tainted 2.6.34-rc5-mmotm0428 #1 [ 1.194915] Call Trace: [ 1.194921] [<ffffffff81038936>] warn_slowpath_common+0x80/0x98 [ 1.194925] [<ffffffff81038963>] warn_slowpath_null+0x15/0x17 [ 1.194929] [<ffffffff8102064a>] __ioremap_caller+0x238/0x3f4 [ 1.194934] [<ffffffff812b27ba>] ? ttm_mem_reg_ioremap+0x54/0x7b [ 1.194939] [<ffffffff810d6aff>] ? kmem_cache_alloc_notrace+0x68/0xc0 [ 1.194942] [<ffffffff8102088f>] ioremap_nocache+0x12/0x14 [ 1.194946] [<ffffffff812b27ba>] ttm_mem_reg_ioremap+0x54/0x7b [ 1.194950] [<ffffffff812b2b5c>] ttm_bo_move_memcpy+0x5e/0x3e0 [ 1.194954] [<ffffffff812a4d7d>] ? drm_mm_split_at_start+0x79/0x95 [ 1.194959] [<ffffffff8120ec2d>] ? do_raw_spin_unlock+0xd0/0xfa [ 1.194964] [<ffffffff812c21cf>] nouveau_bo_move+0x19e/0x222 [ 1.194968] [<ffffffff812afe79>] ttm_bo_handle_move_mem+0x1b4/0x2c0 [ 1.194972] [<ffffffff812b1d9e>] ttm_bo_move_buffer+0xf4/0x14d [ 1.194976] [<ffffffff81210b2d>] ? __list_add+0xb7/0xd2 [ 1.194980] [<ffffffff812b1ee3>] ...
I thought we swatted all these, hit another one...
[ 9.131490] ctnetlink v0.93: registering with nfnetlink.
[ 9.131535]
[ 9.131535] ===================================================
[ 9.131704] [ INFO: suspicious rcu_dereference_check() usage. ]
[ 9.131794] ---------------------------------------------------
[ 9.131883] net/netfilter/nf_conntrack_ecache.c:88 invoked rcu_dereference_check() without protection!
[ 9.131977]
[ 9.131977] other info that might help us debug this:
[ 9.131978]
[ 9.132218]
[ 9.132219] rcu_scheduler_active = 1, debug_locks = 0
[ 9.132434] 1 lock held by swapper/1:
[ 9.132519] #0: (nf_ct_ecache_mutex){+.+...}, at: [<ffffffff8148922d>] nf_conntrack_register_notifier+0x1a/0x75
[ 9.132938]
[ 9.132939] stack backtrace:
[ 9.133129] Pid: 1, comm: swapper Tainted: G W 2.6.34-rc5-mmotm0428 #1
[ 9.133220] Call Trace:
[ 9.133319] [<ffffffff81064832>] lockdep_rcu_dereference+0xaa/0xb2
[ 9.133410] [<ffffffff81489250>] nf_conntrack_register_notifier+0x3d/0x75
[ 9.133521] [<ffffffff81b5a157>] ctnetlink_init+0x71/0xd5
[ 9.133627] [<ffffffff81b5a0e6>] ? ctnetlink_init+0x0/0xd5
[ 9.133735] [<ffffffff810001ef>] do_one_initcall+0x59/0x14e
[ 9.133843] [<ffffffff81b2e68a>] kernel_init+0x144/0x1ce
[ 9.133949] [<ffffffff81003414>] kernel_thread_helper+0x4/0x10
[ 9.134060] [<ffffffff81598a40>] ? restore_args+0x0/0x30
[ 9.134196] [<ffffffff81b2e546>] ? kernel_init+0x0/0x1ce
[ 9.134328] [<ffffffff81003410>] ? kernel_thread_helper+0x0/0x10
[ 9.134530] ip_tables: (C) 2000-2006 Netfilter Core Team
[ 9.134655] TCP bic registered
Thanks for the report !
We can use rcu_dereference_protected() in those cases.
[PATCH] net: Use rcu_dereference_protected in nf_conntrack_ecache
Writers own nf_ct_ecache_mutex.
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
net/netfilter/nf_conntrack_ecache.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index f516961..cdcc764 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -85,7 +85,8 @@ int nf_conntrack_register_notifier(struct nf_ct_event_notifier *new)
struct nf_ct_event_notifier *notify;
mutex_lock(&nf_ct_ecache_mutex);
- notify = rcu_dereference(nf_conntrack_event_cb);
+ notify = rcu_dereference_protected(nf_conntrack_event_cb,
+ lockdep_is_held(&nf_ct_ecache_mutex));
if (notify != NULL) {
ret = -EBUSY;
goto out_unlock;
@@ -105,7 +106,8 @@ void nf_conntrack_unregister_notifier(struct nf_ct_event_notifier *new)
struct nf_ct_event_notifier *notify;
mutex_lock(&nf_ct_ecache_mutex);
- notify = rcu_dereference(nf_conntrack_event_cb);
+ notify = rcu_dereference_protected(nf_conntrack_event_cb,
+ lockdep_is_held(&nf_ct_ecache_mutex));
BUG_ON(notify != new);
rcu_assign_pointer(nf_conntrack_event_cb, NULL);
mutex_unlock(&nf_ct_ecache_mutex);
@@ -118,7 +120,8 @@ int nf_ct_expect_register_notifier(struct nf_exp_event_notifier *new)
struct nf_exp_event_notifier *notify;
mutex_lock(&nf_ct_ecache_mutex);
- notify = rcu_dereference(nf_expect_event_cb);
+ notify = rcu_dereference_protected(nf_expect_event_cb,
+ lockdep_is_held(&nf_ct_ecache_mutex));
if (notify != NULL) {
ret = -EBUSY;
goto out_unlock;
@@ -138,7 +141,8 @@ void nf_ct_expect_unregister_notifier(struct nf_exp_event_notifier *new)
struct nf_exp_event_notifier *notify;
mutex_lock(&nf_ct_ecache_mutex);
- notify = ...Sorry, patch _is_ fine, I had one brain collapse when re-reading it, I thought a different mutex was in use in one of the functions. --
Actually we don't need the rcu_dereference() calls at all since registration and unregistration are protected by the mutexes. I queued this patch in nf-next, the only reason why I haven't submitted it yet is that I was unable to get git to cleanly export only the proper set of patches meant for -next due to a few merges, it insists on including 5 patches already merged upstream. If you don't mind ignoring the first 5 patches in the series, I'll send a pull request tonight.
This will clash with upcoming RCU patches, where rcu protected pointer cannot be directly accessed without lockdep splats. --
Thanks for the information, I didn't realize that when looking at those patches. So I guess the correct fix once those patches are merged would be to use rcu_assign_protected() and rcu_access_pointer(). --
Ah, and that's what you did. Sorry for the confusion :) --
The best approach in that case is rcu_dereference_protected() listing the lock that must be held. Of course, your code, so your choice. --
From: Patrick McHardy <kaber@trash.net> Something like "git format-patch origin" doesn't avoid those upstream commits? Weird... Another trick is to specify a commit range using triple-dot "..." notation, such as "origin...master" --
Yeah, it seems to have something to do with me merging the nf-2.6.git tree a few weeks ago since it had patches queued that were too late Thanks, I'll give it another try, the alternative is manually renumbering the entire patchset. --
I *really* thought we swatted a bunch of these - did the fixes not make it
into linux-next or -mm? Your patch fixed that one, but then:
[ 9.128899] Netfilter messages via NETLINK v0.30.
[ 9.128919] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[ 9.129108] CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use
[ 9.129110] nf_conntrack.acct=1 kernel parameter, acct=1 nf_conntrack module option or
[ 9.129113] sysctl net.netfilter.nf_conntrack_acct=1 to enable it.
[ 9.129135] ctnetlink v0.93: registering with nfnetlink.
[ 9.129452] ip_tables: (C) 2000-2006 Netfilter Core Team
[ 9.129506]
[ 9.129507] ===================================================
[ 9.129683] [ INFO: suspicious rcu_dereference_check() usage. ]
[ 9.129777] ---------------------------------------------------
[ 9.129872] net/netfilter/nf_log.c:55 invoked rcu_dereference_check() without protection!
[ 9.129969]
[ 9.129969] other info that might help us debug this:
[ 9.129970]
[ 9.130232]
[ 9.130232] rcu_scheduler_active = 1, debug_locks = 0
[ 9.130407] 1 lock held by swapper/1:
[ 9.130525] #0: (nf_log_mutex){+.+...}, at: [<ffffffff81481154>] nf_log_register+0x57/0x10f
[ 9.130955]
[ 9.130956] stack backtrace:
[ 9.131162] Pid: 1, comm: swapper Tainted: G W 2.6.34-rc5-mmotm0428 #2
[ 9.131259] Call Trace:
[ 9.131370] [<ffffffff81064832>] lockdep_rcu_dereference+0xaa/0xb2
[ 9.131466] [<ffffffff814811db>] nf_log_register+0xde/0x10f
[ 9.131579] [<ffffffff81b5ca28>] ? log_tg_init+0x0/0x29
[ 9.131689] [<ffffffff81b5ca4d>] log_tg_init+0x25/0x29
[ 9.131800] [<ffffffff810001ef>] do_one_initcall+0x59/0x14e
[ 9.131912] [<ffffffff81b2e68a>] kernel_init+0x144/0x1ce
[ 9.132033] [<ffffffff81003414>] kernel_thread_helper+0x4/0x10
[ 9.132146] [<ffffffff81598a40>] ? restore_args+0x0/0x30
[ 9.132257] [<ffffffff81b2e546>] ? kernel_init+0x0/0x1ce
[ 9.132370] [<ffffffff81003410>] ? ...You probably know this PROVE_RCU thing is new and reserved to developpers ? We yet have to change all spots were a rcu_dereference() was used without rcu_read_lock(). Not a bug by itself, just lockdep is to be instructed not to shout. Maybe 30 patches already in, and maybe 30 other are still needed. --
Thanks for the report !
[PATCH] net: nf_log RCU fixes
nf_log_register() and nf_log_unregister() use a mutex to have exclusive
access to nf_logers[]. Use appropriate rcu_dereference_protected()
lockdep annotation.
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c
index 015725a..7df37fd 100644
--- a/net/netfilter/nf_log.c
+++ b/net/netfilter/nf_log.c
@@ -52,7 +52,8 @@ int nf_log_register(u_int8_t pf, struct nf_logger *logger)
} else {
/* register at end of list to honor first register win */
list_add_tail(&logger->list[pf], &nf_loggers_l[pf]);
- llog = rcu_dereference(nf_loggers[pf]);
+ llog = rcu_dereference_protected(nf_loggers[pf],
+ lockdep_is_held(&nf_log_mutex));
if (llog == NULL)
rcu_assign_pointer(nf_loggers[pf], logger);
}
@@ -70,7 +71,8 @@ void nf_log_unregister(struct nf_logger *logger)
mutex_lock(&nf_log_mutex);
for (i = 0; i < ARRAY_SIZE(nf_loggers); i++) {
- c_logger = rcu_dereference(nf_loggers[i]);
+ c_logger = rcu_dereference_protected(nf_loggers[i],
+ lockdep_is_held(&nf_log_mutex));
if (c_logger == logger)
rcu_assign_pointer(nf_loggers[i], NULL);
list_del(&logger->list[i]);
--
Confirming that one fixed. Now it lives a whole 36 seconds before whinging:
[ 35.328729] ===================================================
[ 35.328803] [ INFO: suspicious rcu_dereference_check() usage. ]
[ 35.328837] ---------------------------------------------------
[ 35.328872] net/ipv6/addrconf.c:2977 invoked rcu_dereference_check() without protection!
[ 35.328926]
[ 35.328927] other info that might help us debug this:
[ 35.328928]
[ 35.329016]
[ 35.329016] rcu_scheduler_active = 1, debug_locks = 0
[ 35.329089] 2 locks held by ifconfig/2680:
[ 35.329120] #0: (&p->lock){+.+.+.}, at: [<ffffffff810f5db8>] seq_read+0x3a/0x42d
[ 35.329217] #1: (rcu_read_lock_bh){.+....}, at: [<ffffffff814eec68>] rcu_read_lock_bh+0x0/0x35
[ 35.329322]
[ 35.329323] stack backtrace:
[ 35.329380] Pid: 2680, comm: ifconfig Tainted: G W 2.6.34-rc5-mmotm0428 #3
[ 35.329439] Call Trace:
[ 35.329471] [<ffffffff81064832>] lockdep_rcu_dereference+0xaa/0xb2
[ 35.329514] [<ffffffff814ef3ae>] if6_get_next+0x34/0x6d
[ 35.329554] [<ffffffff814ef3f8>] if6_seq_next+0x11/0x18
[ 35.329595] [<ffffffff810f6083>] seq_read+0x305/0x42d
[ 35.329635] [<ffffffff810f5d7e>] ? seq_read+0x0/0x42d
[ 35.329676] [<ffffffff81129e8c>] proc_reg_read+0x8d/0xac
[ 35.329717] [<ffffffff810db7e0>] vfs_read+0xe0/0x140
[ 35.329758] [<ffffffff810db8f6>] sys_read+0x45/0x69
[ 35.329799] [<ffffffff810025eb>] system_call_fastpath+0x16/0x1b
Maybe I need to go and stick the "RCU whinge multiple times" patch on this
kernel and get it over with. :)
Highly recommended. ;-) And thanks to you for your testing efforts and to Eric for the fixes!!! Thanx, Paul --
For this last one, I think you should push following patch Paul
Followup of commit 3120438ad6
(rcu: Disable lockdep checking in RCU list-traversal primitives)
Or we might introduce a hlist_for_each_entry_continue_rcu_bh() macro...
diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 004908b..b0c7e24 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -435,10 +435,10 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
* @member: the name of the hlist_node within the struct.
*/
#define hlist_for_each_entry_continue_rcu(tpos, pos, member) \
- for (pos = rcu_dereference((pos)->next); \
+ for (pos = rcu_dereference_raw((pos)->next); \
pos && ({ prefetch(pos->next); 1; }) && \
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); \
- pos = rcu_dereference(pos->next))
+ pos = rcu_dereference_raw(pos->next))
#endif /* __KERNEL__ */
--
I would be happy to if I could find the commit creating hlist_for_each_entry_continue_rcu()... I do see a ca. 2008 patch from Stephen Hemminger: http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg264661.html According to http://patchwork.ozlabs.org/patch/47997/, this is going up the networking tree as of March 18, 2010. So I would be happy to push the patch below, but to do so, I will need to adopt the portion of Stephen's patch that created this primitive. Please let me know how you would like to proceed! --
Hmm, I realize there is a true bug introduced by Stephen patch
Then, net-next-2.6 doesnt yet have your commit Paul to relax
hlist_for_each_entry_rcu(), so its a bit difficult to continue the work.
Thanks
[PATCH net-next-2.6] net: if6_get_next() fix
Must use rcu variant, we are in a rcu_read_lock_bh() section
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 34d2d64..16bb85c 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2979,7 +2979,7 @@ static struct inet6_ifaddr *if6_get_next(struct seq_file *seq,
return ifa;
while (++state->bucket < IN6_ADDR_HSIZE) {
- hlist_for_each_entry(ifa, n,
+ hlist_for_each_entry_rcu(ifa, n,
&inet6_addr_lst[state->bucket], addr_lst) {
if (net_eq(dev_net(ifa->idev->dev), net))
return ifa;
--
From: Eric Dumazet <eric.dumazet@gmail.com> Is that in Linus's tree yet? If it propagates there I can make it propagate to net-next-2.6, you just have to tell me you need it. --
Hmm, it seems it's already in net-next-2.6, sorry for the confusion.
commit 3120438ad68601f341e61e7cb1323b0e1a6ca367
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date: Mon Feb 22 17:04:48 2010 -0800
rcu: Disable lockdep checking in RCU list-traversal primitives
The theory is that use of bare rcu_dereference() is more prone
to error than use of the RCU list-traversal primitives.
Therefore, disable lockdep RCU read-side critical-section
checking in these primitives for the time being. Once all of
the rcu_dereference() uses have been dealt with, it may be time
to re-enable lockdep checking for the RCU list-traversal
primitives.
--
From: Eric Dumazet <eric.dumazet@gmail.com> No problem, just let me know in the future if something upstream needs to propagate. --
Paul, David, here the patch I was thinking about :
Feel free to split it in two parts if you like, I am too tired and must
sleep now ;)
Thanks
[PATCH net-next-2.6] net: rcu fixes
Add hlist_for_each_entry_rcu_bh() and
hlist_for_each_entry_continue_rcu_bh() macros, and use them in
ipv6_get_ifaddr(), if6_get_first() and if6_get_next() to fix lockdeps
warnings.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
include/linux/rculist.h | 29 +++++++++++++++++++++++++++++
net/ipv6/addrconf.c | 16 ++++++++--------
2 files changed, 37 insertions(+), 8 deletions(-)
diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 004908b..4ec3b38 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -429,6 +429,23 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
pos = rcu_dereference_raw(pos->next))
/**
+ * hlist_for_each_entry_rcu_bh - iterate over rcu list of given type
+ * @tpos: the type * to use as a loop cursor.
+ * @pos: the &struct hlist_node to use as a loop cursor.
+ * @head: the head for your list.
+ * @member: the name of the hlist_node within the struct.
+ *
+ * This list-traversal primitive may safely run concurrently with
+ * the _rcu list-mutation primitives such as hlist_add_head_rcu()
+ * as long as the traversal is guarded by rcu_read_lock().
+ */
+#define hlist_for_each_entry_rcu_bh(tpos, pos, head, member) \
+ for (pos = rcu_dereference_bh((head)->first); \
+ pos && ({ prefetch(pos->next); 1; }) && \
+ ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); \
+ pos = rcu_dereference_bh(pos->next))
+
+/**
* hlist_for_each_entry_continue_rcu - iterate over a hlist continuing after current point
* @tpos: the type * to use as a loop cursor.
* @pos: the &struct hlist_node to use as a loop cursor.
@@ -440,6 +457,18 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); \
pos = ...From: Eric Dumazet <eric.dumazet@gmail.com> Paul, let me know if you want to handle these seperately (one commit in your tree for the rculist.h bit and one for the ipv6 change) or to put it all at once into net-next-2.6, I'm happy either way. --
These changes look pretty closely integrated, so it is probably better if they go up your tree with the related networking changes. I will take a look at them. Thanx, Paul --
Looks good!!! It will collide with Arnd's sparse-based changes, but that will be easy to fix, so no problem. --
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Applied, thanks! --
I've committed this patch to my tree, which also fixes up the nf_log changes I already had queued. I've also figured out how to prevent the false commits from showing up using the '^' notation, I'll submit everything after some final testing.
