1. remove the 3rd arg to fsstack_copy_attr_all. There are no users for it: ecryptfs never used the 3rd arg; unionfs stopped using it a long time ago. Halcrow ok'ed this patch some time ago. 2. add necessary locking for 32-bit smp systems in fsstack_copy_inode_size (courtesy Hugh Dickins). 3. minor commenting style changes, and addition of copyrights which were missing. Acked-by: Mike Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> diff --git a/fs/ecryptfs/dentry.c b/fs/ecryptfs/dentry.c index 5e59658..4621f89 100644 --- a/fs/ecryptfs/dentry.c +++ b/fs/ecryptfs/dentry.c @@ -62,7 +62,7 @@ static int ecryptfs_d_revalidate(struct dentry *dentry, struct nameidata *nd) struct inode *lower_inode = ecryptfs_inode_to_lower(dentry->d_inode); - fsstack_copy_attr_all(dentry->d_inode, lower_inode, NULL); + fsstack_copy_attr_all(dentry->d_inode, lower_inode); } out: return rc; diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index e238611..687819d 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -575,9 +575,9 @@ ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry, lower_new_dir_dentry->d_inode, lower_new_dentry); if (rc) goto out_lock; - fsstack_copy_attr_all(new_dir, lower_new_dir_dentry->d_inode, NULL); + fsstack_copy_attr_all(new_dir, lower_new_dir_dentry->d_inode); if (new_dir != old_dir) - fsstack_copy_attr_all(old_dir, lower_old_dir_dentry->d_inode, NULL); + fsstack_copy_attr_all(old_dir, lower_old_dir_dentry->d_inode); out_lock: unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry); dput(lower_new_dentry->d_parent); @@ -910,7 +910,7 @@ static int ecryptfs_setattr(struct dentry *dentry, struct iattr *ia) rc = notify_change(lower_dentry, ia); out: - fsstack_copy_attr_all(inode, lower_inode, NULL); + fsstack_copy_attr_all(inode, lower_inode); return rc; } diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c ...
On Mon, 21 Apr 2008 02:50:42 -0400
The defined(CONFIG_SMP) is wrong. The spinlock is here to protect
dst->i_blocks, but it can be corrupted via preemption on uniprocessor as
well. So a plain old
#if BITS_PER_LONG == 32
However, what about src->i_blocks? It is protected by src->i_lock. The
code as you have it here could read transient values.
Furthermore, i_lock is defined as an innermost lock, for protection of
inode internals. But here we're proposing "taking" inode->i_size_seqcount
inside i_lock. Not necessarily a problem, but it broke the old rule.
We're also doing a read_seqlock of a _different_ inode inside this inode's
i_lock. Again, this is not necessarily a problem (but it might be!) but it
adds complexity and needs thought.
Can we avoid having to think?
void fsstack_copy_inode_size(struct inode *dst, const struct inode *src)
{
blkcnt_t i_blocks;
loff_t i_size;
i_size = i_size_read(src);
spin_lock_32bit(&src->i_lock);
i_blocks = src->i_blocks;
spin_unlock_32bit(&src->i_lock);
i_size_write(dst, i_size);
spin_lock_32bit(&dst->i_lock)
dst->i_blocks = i_blocks;
spin_unlock_32bit(&dst->i_lock)
}
--Sorry, I keep putting off a reply until I can append a patch, but still won't get to do so today. Comments below. Some confusion here - which would have been avoided if I'd been so good as to put much-needed comment in the code of the original patch - sorry. There's two separate issues, i_size and i_blocks. i_size is what I was addressing by adding the #ifdef'ed spin_lock, i_blocks (in the CONFIG_LSF case more precisely than the 32-bit case) is an arguable problem that nobody noticed until you did so now. Here's my original patch comment to Erez: + LTP's iogen01 doio tests hang nicely on 32-bit SMP when /tmp is a unionfs + mount of a tmpfs. See the comment on i_size_write in linux/fs.h: it needs + to be locked, otherwise i_size_read can spin forever waiting for a lost + seqcount update. + + Most filesystems are already holding i_mutex for this, but unionfs calls + fsstack_copy_inode_size from many places, not necessarily holding i_mutex. + Use the low-level i_lock within fsstack_copy_inode_size when 32-bit SMP. So far as that goes, the CONFIG_SMP is correct, because i_size_write does preempt_disable/preempt_enable in the 32-bit PREEMPT but not SMP case. The ifdef chosen reflects that in i_size_write, but I've no objection to removing the CONFIG_SMP part of it if it ends up more palatable that way - I imagine the cost of a double preempt_disable where a single would do is just too minuscule to worry about. But the new use of i_lock, together with the way I only unlock after dealing with i_blocks (I thought, if we have to have preemption off, let's do i_blocks too while it's still off), led you to think that I was trying to keep the two halves of a CONFIG_LSF i_blocks together - whereas I'd never even glanced at the typedef of i_blocks. Personally I don't think it's worth worrying about i_blocks coherency here. stat's generic_fillattr doesn't worry about it. unionfs (ah, but here we're in general stackable filesystem territory) won't very often be handling files w...
In message <20080430101704.9cbd6384.akpm@linux-foundation.org>, Andrew Morton writes: I can't find spin_[um]lock_32bit anywhere (checkd latest mmotm and linus's tree). I therefore assume this was just your way of saying I should: #if BITS_PER_LONG == 32 spin_unlock(&dst->i_lock); #endif Erez. --
On Thu, 1 May 2008 13:18:09 -0400 Nope, it was my way of suggesting that you implement it ;) include/linux/spinlock.h would be a good place. --
Sure, I can implement it and submit a patch. But I've got two questions
first:
1. i_size_read has a tri-state behaviour:
#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
// play with seqcount
#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPT)
// wrap in preempt_disable/enable
#else
// just return value
#endif
Shouldn't the same be done for i_blocks, with matching i_blocks_read(),
i_blocks_write(), and adding an inode->i_blocks_seqcount field?
2. I've rewritten your suggested code a bit to reduce stack use. Modulo
having 32-bit spin_lock/unlock variants, do you see any problem with this
code below? My testing of it so far on 32/64-bit SMP/UMP have all
passed.
void fsstack_copy_inode_size(struct inode *dst, struct inode *src)
{
#if BITS_PER_LONG == 32
blkcnt_t i_blocks;
spin_lock(&src->i_lock);
i_blocks = src->i_blocks;
spin_unlock(&src->i_lock);
spin_lock(&dst->i_lock);
dst->i_blocks = i_blocks;
spin_unlock(&dst->i_lock);
#else
dst->i_blocks = src->i_blocks;
#endif
i_size_write(dst, i_size_read(src));
}
Thanks,
Erez.
--On Thu, 1 May 2008 19:44:18 -0400 I guess so, yes. Maybe there are other things too. We just haven't cared enough to do all this fuss for i_blocks. Even i_size was a bit marginal. i_size is much more important because glitches in there can result in incorrect data being returned from read() and things like that. i_blocks That looks sane, as long as we don't care about i_size-vs-i_blocks coherency. However I expect that approximately zero of the sites which modify i_blocks are taking i_lock to do so. otoh, many of them _will_ accidentally have i_mutex coverage. On the write() path at least. What happened to that idea? --
If i_blocks is indeed less important than i_size, then we can live with some
incoherency b/t i_size and i_blocks, for now. Nevertheless, I propose
adding this to linux/fs.h:
static inline blkcnt_t i_blocks_read(const struct inode *inode)
{
#if BITS_PER_LONG == 32
blkcnt_t i_blocks;
spin_lock(&src->i_lock);
i_blocks = src->i_blocks;
spin_unlock(&src->i_lock);
return i_blocks;
#else
return src->i_blocks;
#endif
}
and a matching i_blocks_write function. We can then gradually convert those
"unsafe" users of i_blocks to use the new i_blocks_read/write helpers.
The nice thing about these two helpers is fsstack_copy_inode_size becomes a
lot cleaner and more elegant:
void fsstack_copy_inode_size(struct inode *dst, struct inode *src)
{
i_blocks_write(dst, i_blocks_read(src));
i_size_write(dst, i_size_read(src));
}
And, if we ever wanted to ensure coherency b/t i_blocks and i_size, we'll
need to create helpers that merge the functionality of i_size_read/write and
i_blocks_read/write.
What do you think?
Erez.
--While this function looks simple, I think it is more generic and better that the caller of fsstack_copy_inode_size() holds i_lock instead of lock it in the callee. Junjiro Okajima --
We actually only need the spinlocked version if blkcnt_t is 64-bit. So #if BITS_PER_LONG == 32 && defined(CONFIG_LSF), plus explanatory comment. You'll also need i_blocks_mod() for things like --
Thanks. I can't say that I'm an expert in these SMP issues. But I'll run your rewritten function through my 32 and 64-bit SMP and non-SMP systems, and see how it behaves. Erez. --
On Wed, 30 Apr 2008 17:09:15 -0400 The obvious risk here is that there's no synchronisation between the copying of i_size and i_blocks. If that's a problem, I _expect_ that i_mutex wold give pretty good coverage (but insufficient for mmap-write-over-a-hole, I guess). And someone needs to write spin_lock_32bit() ;) --
| Netfilter kernel module | 8 hours ago | Linux kernel |
| serial driver xmit problem | 10 hours ago | Linux kernel |
| Why Windows is better than Linux | 10 hours ago | Linux general |
| How can I see my kernel messages in vt12? | 17 hours ago | Linux kernel |
| Grub | 1 day ago | Linux general |
| vmalloc_fault handling in x86_64 | 1 day ago | Linux kernel |
| epoll_wait()ing on epoll FD | 1 day ago | Linux kernel |
| Framebuffer in x86_64 causes problems to multiseat | 1 day ago | Linux kernel |
| Difference between 2.4 and 2.6 regarding thread creation | 1 day ago | Linux general |
| Compiling gfs2 on kernel 2.6.27 | 2 days ago | Linux kernel |
