Hi Greg, After merging the driver-core tree, today's linux-next build (x86_64 allmodconfig) failed like this: fs/ceph/msgpool.c: In function 'ceph_msgpool_put': fs/ceph/msgpool.c:173: error: implicit declaration of function 'kref_set' Caused by commit 10c5d9fdc9ba89606b34f01cbe6ea287abba7395 ("kref: remove kref_set") from the driver-core tree interacting with commit c2e552e76e2c6907ca50cd9a4b747a2e2e8c615e ("ceph: use kref for ceph_msg") from the ceph tree. I applied the following patch for today (which may not be correct): [Sage, if this patch is correct, it should be applied to the ceph tree.] From: Stephen Rothwell <sfr@canb.auug.org.au> Date: Wed, 17 Mar 2010 15:35:22 +1100 Subject: [PATCH] ceph: update for removal of kref_set Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> --- fs/ceph/msgpool.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/ceph/msgpool.c b/fs/ceph/msgpool.c index ca3b44a..030297f 100644 --- a/fs/ceph/msgpool.c +++ b/fs/ceph/msgpool.c @@ -170,7 +170,7 @@ void ceph_msgpool_put(struct ceph_msgpool *pool, struct ceph_msg *msg) msg->front.iov_len = pool->front_len; msg->hdr.front_len = cpu_to_le32(pool->front_len); - kref_set(&msg->kref, 1); /* retake a single ref */ + kref_init(&msg->kref); /* retake a single ref */ list_add(&msg->list_head, &pool->msgs); pool->num++; dout("msgpool_put %p reclaim %p, now %d/%d\n", pool, msg, -- 1.7.0 -- Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ --
On Wed, 17 Mar 2010 15:41:45 +1100 I would say this is correct. The msg has seen it's last_put and is being placed in the pool of free messages, so it needs to be in the same state that ceph_msg_new (called in __fill_msgpool) leaves it in. ceph_msg_new used kref_init, so ceph_msgpool_put should use kref_init too to match. It is a pity that this code cannot use mempool_t.... What if mempool_t were changed to only re-alloc the vector of pointers when it grew, or when it shrank to less than 1/2 it's current size. Would that reduce the frequency of allocations enough for you to be comfortable with it? i.e. always make the vector a power-of-2 size (which is what is probably allocated anyway) while the pool size might be less. ?? --
That would improve the situation, but still mean potentially large allocations (the pools can grow pretty big) that aren't strictly necessary. I can imagine a more modular mempool_t with an ops vector for adding/removing from the pool to cope with situations like this, but I'm not sure it's worth the effort? sage --
Hi guys, Thanks for the confirmation. -- Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/
On Wed, 17 Mar 2010 08:51:49 -0700 (PDT) How big? mempools (and equivalents) should just be large enough to get you through a tight spot. The working assumption is that they will not normally be used. So 2 or 3 should normally be plenty. (looks at code) The only time you resize a ceph_mempool is in ceph_monc_do_statfs where you increment it, perform a synchronous statfs call on the network, then decrement the size of the mempool. How many concurrent statfs calls does it even make sense to make. I'm probably missing something obvious, but wouldn't it make sense to put that all under a mutex so there was only ever one outstanding statfs (per filesystem) - or maybe under a counting semaphore to allow some small number, and make sure to prime the mempool to cover that number. Then you would never resize a mempool at all. I notice that all other mempools that ceph uses are sensibly quite small and stay that way. NeilBrown --
You're right. In fact, after reviewing the code again, it looks like _none_ of the msgpools (current or planned) needs to get large anymore. A protocol change a month or two back made it possible to allocate space for the reply along with the request, which means the only remaining use for the pools is for low memory writeout and the handful of messages we might receive from servers without asking for them. (There used to be more of the msgpool resizing going on for other types of requests, but it's been mostly fixed up.) I'll work on cleaning up the request/reply instances to avoid pools altogether. The remaining pools should be small enough to use the standard mempool_t. Thanks for looking into this! sage --
