Re: An incorrect assumption over radix_tree_tag_get()

Previous thread: by Liang Li on Tuesday, April 6, 2010 - 8:55 am. (1 message)

Next thread: by Wim Van Sebroeck on Tuesday, April 6, 2010 - 9:52 am. (1 message)
From: David Howells
Subject:
Date: Tuesday, April 6, 2010 - 9:19 am

Hi,

I think I've made a bad assumption over my usage of radix_tree_tag_get() in
fs/fscache/page.c.

I've assumed that radix_tree_tag_get() is protected from radix_tree_tag_set()
and radix_tree_tag_clear() by the RCU read lock.  However, now I'm not so
sure.  I think it's only protected against removal of part of the tree.

Can you confirm?

David
--

From: Nick Piggin
Subject:
Date: Tuesday, April 6, 2010 - 10:09 am

It is safe. Synchronization requirements for using the radix tree API
are documented.
--

From: Dave Chinner
Date: Tuesday, April 6, 2010 - 4:34 pm

I don't think it is safe - I made modifications to XFS that modified
radix tree tags under a read lock (not RCU), but this resulted in
corrupted tag state as concurrent tag set/clear operations for
different slots propagated through the tree and got mixed up.
Christoph fixed the problem (f1f724e4b523d444c5a598d74505aefa3d6844d2)
by putting all tag modifications under the write lock.  I can't see
how doing tag modifications under RCU read locks is any safer than
doing it under a spinning read lock....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--

From: Nick Piggin
Date: Wednesday, April 7, 2010 - 12:57 am

No the modifications must all be serialized, but they can run in
parallel with a radix_tree_tag_get().

--

From: David Howells
Date: Tuesday, April 6, 2010 - 11:52 am

I presume you mean the big comment on it in radix-tree.h.

According to that, it is not safe:

 * - any function _modifying_ the tree or tags (inserting or deleting
 *   items, setting or clearing tags) must exclude other modifications, and
 *   exclude any functions reading the tree.
 
David
--

From: David Howells
Date: Tuesday, April 6, 2010 - 12:16 pm

Having said that, the next few lines, say that it is:

 * The notable exceptions to this rule are the following functions:
 * radix_tree_lookup
 * radix_tree_lookup_slot
 * radix_tree_tag_get
 * radix_tree_gang_lookup
 * radix_tree_gang_lookup_slot
 * radix_tree_gang_lookup_tag
 * radix_tree_gang_lookup_tag_slot
 * radix_tree_tagged

However, I'm not sure I agree that radix_tree_tag_get() belongs in this list.

The bug symptoms are this:

Someone is seeing is a bug with an apparently corrupt radix tree tag chain
being observed in radix_tree_tag_get().  Leastways, the BUG() on line 602 in
radix_tree_tag_get() trips once in a while:

	kernel BUG at
		/usr/src/linux-2.6-2.6.33/debian/build/source_i386_none/lib/radix-tree.c:602!
	RIP: 0010:[<ffffffff81182040>] radix_tree_tag_get+0xbc/0xe3
	 [<ffffffffa0247b67>] ? __fscache_maybe_release_page+0x42/0x115
	 [<ffffffffa0372e7d>] ? nfs_fscache_release_page+0x66/0x99 [nfs]
	 [<ffffffff810b6dee>] ? invalidate_inode_pages2_range+0x15a/0x262
	 [<ffffffffa035312f>] ? nfs_invalidate_mapping_nolock+0x18/0xb4
	 [<ffffffffa0354097>] ? nfs_revalidate_mapping+0x85/0x99 [nfs]
	 [<ffffffffa0351158>] ? nfs_file_splice_read+0x5b/0x8e [nfs]
	 [<ffffffff811043d3>] ? splice_direct_to_actor+0xbe/0x188
	 [<ffffffff81104a1c>] ? direct_splice_actor+0x0/0x1e
	 [<ffffffff81113274>] ? ep_scan_ready_list+0x132/0x151
	 [<ffffffff811044e7>] ? do_splice_direct+0x4a/0x64
	 [<ffffffff810e8fa8>] ? do_sendfile+0x12d/0x1a8
	 [<ffffffff8106685b>] ? getnstimeofday+0x55/0xaf
	 [<ffffffff810e906c>] ? sys_sendfile64+0x49/0x88
	 [<ffffffff8103145f>] ? sysenter_dispatch+0x7/0x2e

which is this:

		if (!tag_get(node, tag, offset))
			saw_unset_tag = 1;
		if (height == 1) {
			int ret = tag_get(node, tag, offset);

	-->		BUG_ON(ret && saw_unset_tag);
			return !!ret;
		}

In fs/fscache/page.c, __fscache_maybe_release_page() does a radix_tree_lookup()
with just the RCU read lock held, and then calls radix_tree_tag_get() a couple
of times.  In this case, it's ...
Previous thread: by