Re: [PATCH] ext4: memory leakage in ext4_discard_preallocations

Previous thread: Winning Notification by Mr.Patrick Adams on Wednesday, March 17, 2010 - 2:17 pm. (1 message)

Next thread: [PATCH 0/5] RFC: introduce extended inode owner identifier v6 by Dmitry Monakhov on Thursday, March 18, 2010 - 7:02 am. (8 messages)
From: jing zhang
Date: Thursday, March 18, 2010 - 5:39 am

From: Jing Zhang <zj.barak@gmail.com>

Date: Thu Mar 18 20:33:44 2010

When unexpected errors occur, there is memory leakage, and more.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Jing Zhang <zj.barak@gmail.com>

---

--- linux-2.6.32/fs/ext4/mballoc.c	2009-12-03 11:51:22.000000000 +0800
+++ zj/mballoc.c	2010-03-18 20:41:32.000000000 +0800
@@ -3717,6 +3717,7 @@ void ext4_discard_preallocations(struct
 	struct list_head list;
 	struct ext4_buddy e4b;
 	int err;
+	int occurs = 0;

 	if (!S_ISREG(inode->i_mode)) {
 		/*BUG_ON(!list_empty(&ei->i_prealloc_list));*/
@@ -3781,6 +3782,7 @@ repeat:
 	}
 	spin_unlock(&ei->i_prealloc_lock);

+best_efforts:
 	list_for_each_entry_safe(pa, tmp, &list, u.pa_tmp_list) {
 		BUG_ON(pa->pa_type != MB_INODE_PA);
 		ext4_get_group_no_and_offset(sb, pa->pa_pstart, &group, NULL);
@@ -3811,6 +3813,12 @@ repeat:
 		list_del(&pa->u.pa_tmp_list);
 		call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback);
 	}
+	if (! list_empty(&list)) {
+		if (occurs++ < 2)
+			goto best_efforts;
+		else
+			BUG();
+	}
 	if (ac)
 		kmem_cache_free(ext4_ac_cachep, ac);
 }
--

From: tytso
Date: Thursday, March 18, 2010 - 10:46 am

Hmm, I'm not sure that BUG() is appropriate here.  If there is an
I/O error reading the block bitmap, #1, retrying isn't going to help,
and #2, bringing down the entire system just because of an I/O error
in reading the block bitmap doesn't seem right.

Right now, if there is a problem, we just end up leaving the
preallocated list on the inode.  Does that cause problems later on
down the line which you have observed?

					- Ted


--

From: jing zhang
Date: Friday, March 19, 2010 - 7:17 am

and is there still chance to call the
       call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback);
function again later on? (I am not sure yet the chance does exist.)

If no chance, how about the kmem_cache subsystem then?
After reboot, the file system is still reliable, or just with a few lost blocks?

Thus it is necessary, at least for me, to make sure whether the chance exists.
                                      - zj
--

From: Andreas Dilger
Date: Friday, March 19, 2010 - 10:27 am

Exactly, which is the reason why it should not cause the system to  
hang.  The filesystem should handle such errors gracefully if this is  
possible, return an error to the application, and/or marking the  


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--

From: jing zhang
Date: Saturday, March 20, 2010 - 7:05 am

Evening,

Thanks Andreas and Ted for your good explanations to deal error in
gentle way, and I got it that the chance may exist since the pa is not
deleted from its group_list yet.

And it also seems that there is work deserved.
       - zj

---

--- linux-2.6.32/fs/ext4/mballoc.c	2009-12-03 11:51:22.000000000 +0800
+++ fs/mballoc.c	2010-03-20 21:40:04.000000000 +0800
@@ -3788,14 +3788,14 @@ repeat:
 		err = ext4_mb_load_buddy(sb, group, &e4b);
 		if (err) {
 			ext4_error(sb, __func__, "Error in loading buddy "
-					"information for %u", group);
+			"information for group %u inode %lu", group, inode->i_ino);
 			continue;
 		}

 		bitmap_bh = ext4_read_block_bitmap(sb, group);
 		if (bitmap_bh == NULL) {
 			ext4_error(sb, __func__, "Error in reading block "
-					"bitmap for %u", group);
+			"bitmap for group %u inode %lu", group, inode->i_ino);
 			ext4_mb_release_desc(&e4b);
 			continue;
 		}
@@ -3811,6 +3811,14 @@ repeat:
 		list_del(&pa->u.pa_tmp_list);
 		call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback);
 	}
+	if (! list_empty(&list)) {
+		/*
+		 * we have to do something for the check in
+		 * the function, ext4_mb_discard_group_preallocations()
+		 */
+		list_for_each_entry(pa, &list, u.pa_tmp_list)
+			pa->pa_deleted = 0;
+	}
 	if (ac)
 		kmem_cache_free(ext4_ac_cachep, ac);
 }
--

From: Aneesh Kumar K. V
Date: Friday, March 26, 2010 - 1:37 am

Can you add a comment saying if we fail to load buddy or read block
bitmap we skip freeing the prealloc space. So mark it undeleted. The
prealloc space is still removed from the inode but it is linked to the
group prealloc list via (pa_group_list)


-aneesh
--

From: jing zhang
Date: Friday, March 26, 2010 - 7:12 am

/*
 * here the tricky is to mark PAs undeleted,
 * since they are still on their pa_group_list.
 */

That is it, Aneesh.

I am still waiting for comments, if any, from Ted, since I am not sure
the tricky is safe enough. And I am able not to deliver better patch
tonight :(

             - zj
--

Previous thread: Winning Notification by Mr.Patrick Adams on Wednesday, March 17, 2010 - 2:17 pm. (1 message)

Next thread: [PATCH 0/5] RFC: introduce extended inode owner identifier v6 by Dmitry Monakhov on Thursday, March 18, 2010 - 7:02 am. (8 messages)