[PATCH] ext4: memory black hole in ext4_mb_put_pa()

Previous thread: [Bug 14830] When other IO is running sync times go to 10 to 20 minutes by bugzilla-daemon on Friday, April 2, 2010 - 2:23 pm. (1 message)

Next thread: Changes in quota tools cause xfstests #219 to fail by Theodore Ts'o on Saturday, April 3, 2010 - 2:02 pm. (2 messages)
From: jing zhang
Date: Saturday, April 3, 2010 - 12:16 am

From: Jing Zhang <zj.barak@gmail.com>

Date: Sat  Apr 3  15:11:47     2010

The story about the life of PA starts with kmem_cache_alloc(),
followed by atomic_set() at [line:3369/3429],
	
	...
	[many algorithms of EXT4]
	...
	ac->ac_pa = pa;
	...

and is about to finish in the function, ext4_mb_release_context().

Check-Point-01, [line:4101-4125]
	
	First, the rcu_read_lock() method used conflicts with
	that at [line:4019].
	
	Second, whether RCU, by its defination, is capable of
	protecting competive writers should be concerned.
	
	Third, the locality group, please see [line:2418], is
	designed to be PERCPU.
	
	Also note, after added, pa could be at any position on
	the lg_prealloc_list.
	
	If the length of list is greater than 8, let's goto next check point.

Check-Point-02, [line:4019-4057]

	First, the atomic_read() at [line:4023] is not able to assure that
	this pa is the one we just used for block allocation. It is inited
	to be 1 at [line:3369/3429]. The pa_count at [line:4023] is just the
	number of CAs holding this pa, and I guess its value is 1 in case of
	newly created pa.
	
	Second, PAs are selected without good considerations of the sort
	effort mentioned above.
	
	Third, the selected PAs may include the just added pa at [line:4111],
	(let's lable it pa-4111)
	and if selected, let's goto next check point.

Check-Point-03, [line:4074]

	The victim, pa-4111, is destroied.

Check-Point-04, [line:4167-4168]

	After ext4_mb_add_n_trim() returns,
	ext4_mb_put_pa() may be crashed by pa-4111.

Check-Point-05, [line:3261-3269]

	First, [line:3261-3262] are copied here,

	if (!atomic_dec_and_test(&pa->pa_count) || pa->pa_free != 0)
		return;

	and are changed to,

	__3261	if (!atomic_dec_and_test(&pa->pa_count))
	__3262		return;
	__3263	if (pa->pa_free != 0)
	__3264		return;
	
	If the changed equals the original,
	the return at __3264 will kill the pa which is parameter,
	no matter pa == pa-4111. It is a black hole.
	
	Second, pa may ...
Previous thread: [Bug 14830] When other IO is running sync times go to 10 to 20 minutes by bugzilla-daemon on Friday, April 2, 2010 - 2:23 pm. (1 message)

Next thread: Changes in quota tools cause xfstests #219 to fail by Theodore Ts'o on Saturday, April 3, 2010 - 2:02 pm. (2 messages)