login
Header Space

 
 

linux-fsdevel mailing list

FromSubjectsort iconDate
Neil Brown
RE: Software raid0 will crash the file-system, when each dis...
Yes, I meant 2T, and yes, the components are always over 2T. So I'm at a complete loss. The raid0 code follows the same paths and does the same things and uses 64bit arithmetic where needed. So I have no idea how there could be a difference between these two cases. I'm at a loss... NeilBrown -
May 16, 10:45 pm 2007
Jeff Zheng
RE: Software raid0 will crash the file-system, when each dis...
I tried the patch, same problem show up, but no bug_on report Is there any other things I can do? -
May 16, 11:11 pm 2007
Neil Brown
RE: Software raid0 will crash the file-system, when each dis...
Thanks. Everything looks fine here. The only difference of any significance between the working and non-working configurations is that in the non-working, the component devices are larger than 2Gig, and hence have sector offsets greater than 32 bits. This does cause a slightly different code path in one place, but I cannot see it making a difference. But maybe it does. What architecture is this running on? What C compiler are you using? Can you try with this patch? It is the only thing...
May 16, 8:48 pm 2007
Jeff Zheng
RE: Software raid0 will crash the file-system, when each dis...
Do u mean 2T here?, but in both configuartion, the component devices are I386(i686) Gcc 4.0.2 20051125, OK, I will try the patach and post the result. Best Regards Jeff Zheng -
May 16, 10:09 pm 2007
Neil Brown
Re: + knfsd-exportfs-add-exportfsh-header-fix.patch added to...
You need to be able to map a dentry to a filehandle (you get about 20 bytes) and back again. If CIFS provides some fix-length identifier for files, then you might be able to do it. If not, cannot really do it at all. And I suspect the later. (There are other requirements, like get_parent, but we could probably work around those if we really needed to). Theoretically, you could make it work with NFSv4 and volatile file handles, but I doubt it would really work in practice. I don't think the "...
May 16, 8:05 pm 2007
Steven French
Re: + knfsd-exportfs-add-exportfsh-header-fix.patch added to...
Most CIFS servers (Windows on NTFS, Samba etc.) can return a "unique identifier" (a 64 bit inode number), in conjunction with the volume id, that is probably good enough ... right? This can be returned on various calls (level 0x03EE "file_internal_info" - returns only this number). If reverse lookup is required - ie given a "unique identifier" what is its path name - there are probably a few different ways to handle this but presumably local filesystems run into the same issue. Steve Fr...
May 16, 11:11 pm 2007
Vladimir V. Saveliev
Re: [patch 30/41] reiserfs convert to new aops.
Hello, Nick This is new version of the patch. reiserfs_prepare_write and reiserfs_commit_write are still there, but they do not show themselves in any struct address_space_operations instance. xattrs and ioctl use them directly.
May 16, 1:22 pm 2007
Jörn
[PATCH resend] introduce I_SYNC
While others are busy coming up with new silly names, here is something substantial to worry about. Patches fixes a deadlock problem well enough for LogFS to survive. The problem itself is generic and seems to be ancient. Shaggy has code in JFS from about 2.4.20 that seems to work around the deadlock. Dave Chinner indicated that this could cause latency problems (not a deadlock) on the NFS server side. I still suspect that NTFS has hit the same deadlock and its current "fix" will cause data cor...
May 16, 1:01 pm 2007
Andrew Morton
Re: [PATCH resend] introduce I_SYNC
On Wed, 16 May 2007 19:01:14 +0200 J=F6rn Engel <joern@lazybastard.org>= gack, you like sticking your head in dark and dusty holes. If we're going to do this then please let's get some exhaustive comment= ary in there so that others have a chance of understanding these flags with= out having to do the amount of reverse-engineering which you've been put th= rough. -
May 16, 1:15 pm 2007
Jörn
Re: [PATCH resend] introduce I_SYNC
I don't remember enjoying the experience too much. For a day or two I through. Not sure how the monasteries would cope with that. J=C3=B6rn --=20 I've never met a human being who would want to read 17,000 pages of documentation, and if there was, I'd kill him to get him out of the gene pool. -- Joseph Costello -
May 16, 1:47 pm 2007
David Howells
[PATCH] AFS: Implement shared-writable mmap [try #2]
Implement shared-writable mmap for AFS. The key with which to access the file is obtained from the VMA at the point where the PTE is made writable by the page_mkwrite() VMA op and cached in the affected page. If there's an outstanding write on the page made with a different key, then page_mkwrite() will flush it before attaching a record of the new key. [try #2] Only flush the page if the page is still part of the mapping (truncate may have discarded it). Signed-off-by: David Howells <dho...
May 16, 6:02 am 2007
Nick Piggin
Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
I would strongly suggest you used (0, PAGE_CACHE_SIZE) for the range, and have your nopage function DTRT. Minor issue: you can just check for `if (!page->mapping)` for truncation, which is the usual signal to tell the reader you're checking for truncate. Rather than add this (not always correct) comment about the VM workings, I'd just add a directive in the page_mkwrite API documentation that the filesystem -- SUSE Labs, Novell Inc. -
May 16, 8:07 am 2007
David Howells
Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
That tells prepare_write() that the region to be modified is the whole page - which is incorrect. We're going to change a little bit of it. Hmmm... Thinking about it again, I probably shouldn't be using afs_prepare_write() at all. afs_prepare_write() does two things: (1) Fills in the bits around the edges of the region to be modified if the page is not up to date. (2) Records the region of the page to be modified. If afs_prepare_write() function is invoked by write(), then the regi...
May 16, 9:16 am 2007
Nick Piggin
Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
You don't know how much you're going to change, but it could be anything in the range of 0 to PAGE_CACHE_SIZE. Clearly a (0, PAGE_CACHE_SIZE) Oh god you're doing ClearPageUptodate directly on pagecache pages? In general (modulo bugs and crazy filesystems), you're not allowed to have !uptodate pages mapped into user addresses because that implies the user would be allowed to see garbage. If you follow that rule, then you can never have a !uptodate page be passed into page_mkwrite (unless it ha...
May 16, 9:32 am 2007
David Howells
Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
Ths situation I have to deal with is a tricky one. Consider: (1) User A modifies a page with his key. This change gets made in the pagecache, but is not written back immediately. (2) User B then wants to modify the same page, but with a different key. This means that afs_prepare_write() has to flush A's writes back to the server before B is permitted to write. (3) The flush fails because A is no longer permitted to write to that file. This means that the change in the...
May 16, 12:12 pm 2007
Nick Piggin
Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
Which ways? Are you talking about prepare_write being called from page_mkwrite, or anywhere? More generally it sounds like a nasty thing to have a writeback cache if it can become incoherent (due to dirty pages that subsequently cannot be written back) without notification. Have you tried doing a write-through one? You may be clearing PG_uptodate, but isn't there still an underlying problem that you can have mappings to the page at that point? If that isn't a problem truncate_complete_page ...
May 16, 12:32 pm 2007
David Howells
Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
Because invalidate_inode_pages() forcibly removes the dirty flag from each page in the inode and then calls invalidatepage() - and thus they don't get written back, but some of those pages may contain writes from other processes. The whole inode isn't owned by one user at a time. I hadn't considered invalidate_inode_pages_range(), but that suffers from the (1) prepare_write() is called with the target page locked and does not release the lock. The truncation routines lock the page prior ...
May 16, 12:56 pm 2007
Nick Piggin
Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
It had better not. We use that sucker to nuke pagecache when we're trying to You can drop the lock, do the invalidation, and return AOP_TRUNCATED_PAGE. The I just mean more generally. simple write(2) writes, for starters. For shared writable mmap? I don't know... does POSIX require mmap data That's what the invalidate / truncate routines do. -- SUSE Labs, Novell Inc. -
May 16, 1:28 pm 2007
David Howells
Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
Hmmm... There's a danger of incurring a race by doing that. Consider two processes both trying to write to a dirty page for which writeback will be rejected: (1) The first process gets EKEYREJECTED from the server, drops its lock and is then preempted. (2) The second process gets EKEYREJECTED from the server, drops its lock, truncates the page, reloads the page and modifies it. (3) The first process resumes and truncates the page, thereby splatting the second process's wr...
May 16, 2:45 pm 2007
David Howells
Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
You can't do write-through caching for shared-writable mmap because the writes go directly into the pagecache once the page is made writable, at least, short of instruction emulation. At some point in the future we'll be asked to turf the data back to the I suspect so, but I don't know offhand. I want it to be coherent anyway, otherwise it's inconsistent with OpenAFS and Arla (or at least more so). Note also that the choice of write-through or write-back caching also has implications for loc...
May 16, 1:46 pm 2007
Nick Piggin
Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
So did we just get your issues sorted? I _think_ *snip* is the Howells code for "OK", but I can never be sure ;) FWIW, as a rule, ClearPageUptodate should never be done by anyone, least of all a filesystem on regular file pagecache. I need to go through and audit this stuff... but so much backlog :P Anyway, *snip* the side discussion. -- SUSE Labs, Novell Inc. -
May 16, 1:59 pm 2007
Christoph Hellwig
Re: + knfsd-exportfs-add-exportfsh-header-fix.patch added to...
Looks like cifs has grown and export_operations table since I did the patch. But with only a get_parent method that returns and error it's not useful at all, so we should rather remove the whole file: Signed-off-by: Christoph Hellwig <hch@lst.de> Index: linux-2.6/fs/cifs/cifsfs.c =================================================================== --- linux-2.6.orig/fs/cifs/cifsfs.c 2007-05-16 07:55:35.000000000 +0200 +++ linux-2.6/fs/cifs/cifsfs.c 2007-05-16 07:55:50.000000000 +0200 @...
May 16, 2:57 am 2007
Steven French
Re: + knfsd-exportfs-add-exportfsh-header-fix.patch added to...
Any ideas what are the minimum export operation(s) that cifs would need to add to export under nfsd? It was not clear to me after reading the Exporting document in Documentation directory. (some users had wanted to export files from Windows servers to nfs clients files by putting an nfs server mounted over cifs in between - I realize that this can corrupt data due to nfs client caching etc., as even in some cases could happen if you try to export a cluster file system under nfsd). Steve F...
May 16, 10:55 am 2007
J. Bruce Fields May 16, 12:02 pm 2007
Steven French
Re: + knfsd-exportfs-add-exportfsh-header-fix.patch added to...
I thought that until a few days ago, a sequence like the following (two nfs servers exporting the same clustered data) on client 1 lock range A through B of file1 (exported from nfs server 1) on client 2 lock range A through C of file 1 (exported from nfs server 2) on client 1 write A through B on client 2 write A through C on client 1 unlock A through B on client 2 unlock A through C would corrupt data (theoretically could be fixed as nfsd calls lock methods [ message continues ]
" title="http://git.kernel.org/?p=linux...">http://git.kernel.org/?p=linux...
May 16, 1:03 pm 2007
J. Bruce Fields
Re: + knfsd-exportfs-add-exportfsh-header-fix.patch added to...
Hm. How could nfsd get stale metadata? I'm just (probably naively) assuming that a "cluster" filesystem attempts to provide much higher cache consistency than actually necessary to keep nfs clients happy. But, if not, it would be nice to understand the problem. --b. -
May 16, 5:33 pm 2007
Christoph Hellwig
Re: + knfsd-exportfs-add-exportfsh-header-fix.patch added to...
That patch was missing the Makefile hunk, here's the proper one: (I wish there was a way to avoid having to do quilt add everytime) Signed-off-by: Christoph Hellwig <hch@lst.de> Index: linux-2.6/fs/cifs/cifsfs.c =================================================================== --- linux-2.6.orig/fs/cifs/cifsfs.c 2007-05-16 07:55:35.000000000 +0200 +++ linux-2.6/fs/cifs/cifsfs.c 2007-05-16 07:55:50.000000000 +0200 @@ -49,10 +49,6 @@ static struct quotactl_ops cifs_quotactl_ops; #en...
May 16, 8:53 am 2007
Bill Davidsen
Re: Software raid0 will crash the file-system, when each dis...
If I read this correctly, the problem is with JFS rather than RAID? Have you tried not mounting the JFS filesystem but just starting the array which crashes, so you can read bits of it, etc, and verify that the array itself is working? And can you run an fsck on the filesystem, if that makes sense? I assume you got to actually write a f/s at one time, and I've never used JFS under Linux. I spent five+ years using it on AIX, though, complex but -- bill davidsen <davidsen@tmr.com> ...
May 16, 1:28 pm 2007
david
Re: Software raid0 will crash the file-system, when each dis...
he had the same problem with xfs. David Lang
May 16, 1:58 pm 2007
Andreas Dilger
Re: Software raid0 will crash the file-system, when each dis...
Check if your kernel has CONFIG_LBD enabled. The kernel doesn't check if the block layer can actually write to a block device > 2TB. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. -
May 16, 10:04 am 2007
david
Re: Software raid0 will crash the file-system, when each dis...
my experiance is taht if you don't have CONFIG_LBD enabled then the kernel will report the larger disk as 2G and everything will work, you just won't get all the space. plus he seems to be crashing around 500G of data and finally (if I am reading the post correctly) if he configures the drives as 4x2.2TB=11TB instead of 2x5.5TB=11TB he doesn't have the same problem. I'm getting ready to setup a similar machine that will have 3x10TB (3 15 disk arrays with 750G drives), but won't be ready...
May 16, 2:04 pm 2007
Jeff Zheng
RE: Software raid0 will crash the file-system, when each dis...
You will definitely meet the same problem. As very large hardware disk becomes more and more popular, this will become a big issue for software raid. Jeff -----Original Message----- From: david@lang.hm [mailto:david@lang.hm] Sent: Thursday, 17 May 2007 6:04 a.m. To: Andreas Dilger Cc: Jeff Zheng; linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org Subject: Re: Software raid0 will crash the file-system, when each disk is 5TB my experiance is taht if you don't have CONFIG_LB...
May 16, 5:44 pm 2007
Jan Engelhardt
Re: Software raid0 will crash the file-system, when each dis...
You could emulate it with VMware. Big disks are quite "cheap" when they are not allocated. Jan --
May 16, 2:16 pm 2007
Jeff Zheng
RE: Software raid0 will crash the file-system, when each dis...
Problem is that is only happens when you actually write data to the raid. You need the actual space to reproduce the problem. Jeff -----Original Message----- From: Jan Engelhardt [mailto:jengelh@linux01.gwdg.de] Sent: Thursday, 17 May 2007 6:17 a.m. To: david@lang.hm Cc: Andreas Dilger; Jeff Zheng; linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org Subject: Re: Software raid0 will crash the file-system, when each disk is 5TB few more days. You could emulate it with VMware. B...
May 16, 5:42 pm 2007
Dave Kleikamp
Re: [PATCH 1/5][TAKE3] fallocate() implementation on i86, x8...
i_blocks will be updated, so it seems reasonable to update ctime. mtime shouldn't be changed, though, since the contents of the file will be -- David Kleikamp IBM Linux Technology Center -
May 16, 8:21 am 2007
David Chinner
Re: [PATCH 1/5][TAKE3] fallocate() implementation on i86, x8...
That's assuming blocks were actually allocated - if the prealloc range already has underlying blocks there is no change and so we should not be changing mtime either. Only the filesystem will know if it has changed the file, so I think that timestamp updates need to be driven down to that level, not done blindy at the highest layer.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group
May 16, 7:40 pm 2007
Amit K. Arora
Re: [PATCH 1/5][TAKE3] fallocate() implementation on i86, x8...
I agree. Thus the ctime should change for FA_PREALLOCATE mode also (which does not change the file size) - if we end up having this additional mode in near future. -- Regards, -
May 16, 8:37 am 2007
Amit K. Arora
Re: [PATCH 1/5][TAKE3] fallocate() implementation on i86, x8...
I think ->fallocate() should return a "long", since sys_fallocate() has to return what ->fallocate() returns and hence their return type should I will change the ext4_fallocate() to return a "long" (in patch 4/5) in the next post. Agree ? Thanks! -- Regards,
May 16, 8:31 am 2007
Christoph Hellwig
Re: [PATCH 2/2] AFS: Implement shared-writable mmap
It looks like you really want Dave's generic page_mkwrite. And we should really get that one merged for 2.6.22. -
May 16, 3:10 am 2007
Nick Piggin
Re: [PATCH 2/2] AFS: Implement shared-writable mmap
Dave asked me about that the other day (in relation to the ->fault ops)... I have no problem merging it for 2.6.22 and rebasing my patches on top. -- SUSE Labs, Novell Inc. -
May 16, 3:17 am 2007
Hugh Dickins
Re: [PATCH 2/2] AFS: Implement shared-writable mmap
That's right, the overhead of the lock_page()/unlock_page() in the common path of faulting, and of the extra call to unmap_mapping_range() when truncating (because page lock doesn't satisfactorily replace the old So far, yes. I expect it'll surface in some reallife workload sometime, but let's not get too depressed about that. I guess It is a pity to be adding overhead to a common path in order to fix Again, rather too blithely said. You have a deep well of ingenuity, Getting a "yes" or ...
May 16, 12:36 pm 2007
Nick Piggin
Re: [PATCH 2/2] AFS: Implement shared-writable mmap
I say I believe scalability will not be a huge issue, because for concurrent faulters on the same page, they still have cacheline contention beginning before we lock the page (tree_lock), and ending after we unlock it (page->_count), and a few others in the middle for good mesure. I sure don't think it is going to help, but I don't think it would be a great impact on an alrady sucky workload. We would have contention against other sources of lock_page, but OTOH, we want to fix up the clear_...
May 16, 1:14 pm 2007
Hugh Dickins
Re: [PATCH 2/2] AFS: Implement shared-writable mmap
I'm hoping you intended one less negative ;) -
May 16, 1:26 pm 2007
David Howells
Re: [PATCH 2/2] AFS: Implement shared-writable mmap
Or did you mean one fewer negative? :-) David -
May 16, 1:48 pm 2007
Chuck Ebbert May 16, 1:34 pm 2007
Nick Piggin May 16, 1:30 pm 2007
Pavel Machek
Re: [PATCH] LogFS take three
Hi! Please just delete it, not comment it out like this. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -
May 16, 3:17 pm 2007
Jörn
Re: [PATCH] LogFS take three
I know. Top 3 items of my todo list are: - Handle system crashes - Add second journal That will get resurrected, even before the move to userspace. I had to change the filesystem format for compression support and this is an artifact of the transition phase. J=C3=B6rn --=20 Ninety percent of everything is crap. -- Sturgeon's Law -
May 16, 3:23 pm 2007
Pekka J Enberg
Re: [PATCH] LogFS take three
Note that BUG() can be a no-op so dumping something on disk might not make sense there. This seems useful, but you probably need to make this bit more generic so that using BUG() proper in your filesystem code does the Please drop this wrapper function. It's better to open-code the error This looks fishy. All reads and writes are serialized by compr_mutex Seems wasteful to first read the data in a scratch buffer and then memcpy() it immediately for the COMPR_NONE case. Any reason why ...
May 16, 6:21 am 2007
Jörn
Re: [PATCH] LogFS take three
Hmm. I am not sure how this could be generic and still make sense. LogFS has some unused write-once space in the superblock segment. Embedded people always have problems debugging and were suggesting usin= g this to store debug information. That allows me to ask for a filesyste= m image and get both the offending image plus a crash dump. It also allows me to abort mounting if I ever see an existing crash dump (not implemented yet). "First failure data capture" was an old IBM slogal and the ...
May 16, 8:26 am 2007
previous daytodaynext day
May 15, 2007May 16, 2007May 17, 2007
speck-geostationary