| From | Subject | Date |
|---|---|---|
| Neil Brown | RE: Software raid0 will crash the file-system, when each dis...
Yes, I meant 2T, and yes, the components are always over 2T. So I'm
at a complete loss. The raid0 code follows the same paths and does
the same things and uses 64bit arithmetic where needed.
So I have no idea how there could be a difference between these two
cases.
I'm at a loss...
NeilBrown
-
| May 16, 10:45 pm 2007 |
| Jeff Zheng | RE: Software raid0 will crash the file-system, when each dis...
I tried the patch, same problem show up, but no bug_on report
Is there any other things I can do?
-
| May 16, 11:11 pm 2007 |
| Neil Brown | RE: Software raid0 will crash the file-system, when each dis...
Thanks.
Everything looks fine here.
The only difference of any significance between the working and
non-working configurations is that in the non-working, the component
devices are larger than 2Gig, and hence have sector offsets greater
than 32 bits.
This does cause a slightly different code path in one place, but I
cannot see it making a difference. But maybe it does.
What architecture is this running on?
What C compiler are you using?
Can you try with this patch? It is the only thing...
| May 16, 8:48 pm 2007 |
| Jeff Zheng | RE: Software raid0 will crash the file-system, when each dis...
Do u mean 2T here?, but in both configuartion, the component devices are
I386(i686)
Gcc 4.0.2 20051125,
OK, I will try the patach and post the result.
Best Regards
Jeff Zheng
-
| May 16, 10:09 pm 2007 |
| Neil Brown | Re: + knfsd-exportfs-add-exportfsh-header-fix.patch added to...
You need to be able to map a dentry to a filehandle (you get about 20
bytes) and back again.
If CIFS provides some fix-length identifier for files, then you might
be able to do it. If not, cannot really do it at all. And I suspect
the later. (There are other requirements, like get_parent, but we
could probably work around those if we really needed to).
Theoretically, you could make it work with NFSv4 and volatile file
handles, but I doubt it would really work in practice. I don't think
the "...
| May 16, 8:05 pm 2007 |
| Steven French | Re: + knfsd-exportfs-add-exportfsh-header-fix.patch added to...
Most CIFS servers (Windows on NTFS, Samba etc.) can return a "unique
identifier" (a 64 bit inode number), in conjunction with the volume id,
that is probably good enough ... right? This can be returned on various
calls (level 0x03EE "file_internal_info" - returns only this number). If
reverse lookup is required - ie given a "unique identifier" what is its
path name - there are probably a few different ways to handle this but
presumably local filesystems run into the same issue.
Steve Fr...
| May 16, 11:11 pm 2007 |
| Vladimir V. Saveliev | Re: [patch 30/41] reiserfs convert to new aops.
Hello, Nick
This is new version of the patch.
reiserfs_prepare_write and reiserfs_commit_write are still there, but they do not show themselves in any struct address_space_operations instance.
xattrs and ioctl use them directly.
| May 16, 1:22 pm 2007 |
| Jörn | [PATCH resend] introduce I_SYNC
While others are busy coming up with new silly names, here is something
substantial to worry about.
Patches fixes a deadlock problem well enough for LogFS to survive. The
problem itself is generic and seems to be ancient. Shaggy has code in
JFS from about 2.4.20 that seems to work around the deadlock. Dave
Chinner indicated that this could cause latency problems (not a
deadlock) on the NFS server side. I still suspect that NTFS has hit the
same deadlock and its current "fix" will cause data cor...
| May 16, 1:01 pm 2007 |
| Andrew Morton | Re: [PATCH resend] introduce I_SYNC
On Wed, 16 May 2007 19:01:14 +0200 J=F6rn Engel <joern@lazybastard.org>=
gack, you like sticking your head in dark and dusty holes.
If we're going to do this then please let's get some exhaustive comment=
ary
in there so that others have a chance of understanding these flags with=
out
having to do the amount of reverse-engineering which you've been put th=
rough.
-
| May 16, 1:15 pm 2007 |
| Jörn | Re: [PATCH resend] introduce I_SYNC
I don't remember enjoying the experience too much. For a day or two I
through.
Not sure how the monasteries would cope with that.
J=C3=B6rn
--=20
I've never met a human being who would want to read 17,000 pages of
documentation, and if there was, I'd kill him to get him out of the
gene pool.
-- Joseph Costello
-
| May 16, 1:47 pm 2007 |
| David Howells | [PATCH] AFS: Implement shared-writable mmap [try #2]
Implement shared-writable mmap for AFS.
The key with which to access the file is obtained from the VMA at the point
where the PTE is made writable by the page_mkwrite() VMA op and cached in the
affected page.
If there's an outstanding write on the page made with a different key, then
page_mkwrite() will flush it before attaching a record of the new key.
[try #2] Only flush the page if the page is still part of the mapping (truncate
may have discarded it).
Signed-off-by: David Howells <dho...
| May 16, 6:02 am 2007 |
| Nick Piggin | Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
I would strongly suggest you used (0, PAGE_CACHE_SIZE) for the range, and
have your nopage function DTRT.
Minor issue: you can just check for `if (!page->mapping)` for truncation,
which is the usual signal to tell the reader you're checking for truncate.
Rather than add this (not always correct) comment about the VM workings, I'd
just add a directive in the page_mkwrite API documentation that the filesystem
--
SUSE Labs, Novell Inc.
-
| May 16, 8:07 am 2007 |
| David Howells | Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
That tells prepare_write() that the region to be modified is the whole page -
which is incorrect. We're going to change a little bit of it.
Hmmm... Thinking about it again, I probably shouldn't be using
afs_prepare_write() at all. afs_prepare_write() does two things:
(1) Fills in the bits around the edges of the region to be modified if the
page is not up to date.
(2) Records the region of the page to be modified.
If afs_prepare_write() function is invoked by write(), then the regi...
| May 16, 9:16 am 2007 |
| Nick Piggin | Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
You don't know how much you're going to change, but it could be anything
in the range of 0 to PAGE_CACHE_SIZE. Clearly a (0, PAGE_CACHE_SIZE)
Oh god you're doing ClearPageUptodate directly on pagecache pages?
In general (modulo bugs and crazy filesystems), you're not allowed to have
!uptodate pages mapped into user addresses because that implies the user
would be allowed to see garbage.
If you follow that rule, then you can never have a !uptodate page be passed
into page_mkwrite (unless it ha...
| May 16, 9:32 am 2007 |
| David Howells | Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
Ths situation I have to deal with is a tricky one. Consider:
(1) User A modifies a page with his key. This change gets made in the
pagecache, but is not written back immediately.
(2) User B then wants to modify the same page, but with a different key.
This means that afs_prepare_write() has to flush A's writes back to the
server before B is permitted to write.
(3) The flush fails because A is no longer permitted to write to that file.
This means that the change in the...
| May 16, 12:12 pm 2007 |
| Nick Piggin | Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
Which ways? Are you talking about prepare_write being called from page_mkwrite,
or anywhere?
More generally it sounds like a nasty thing to have a writeback cache if it can
become incoherent (due to dirty pages that subsequently cannot be written
back) without notification. Have you tried doing a write-through one?
You may be clearing PG_uptodate, but isn't there still an underlying problem
that you can have mappings to the page at that point? If that isn't a problem
truncate_complete_page ...
| May 16, 12:32 pm 2007 |
| David Howells | Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
Because invalidate_inode_pages() forcibly removes the dirty flag from each page
in the inode and then calls invalidatepage() - and thus they don't get written
back, but some of those pages may contain writes from other processes. The
whole inode isn't owned by one user at a time.
I hadn't considered invalidate_inode_pages_range(), but that suffers from the
(1) prepare_write() is called with the target page locked and does not release
the lock. The truncation routines lock the page prior ...
| May 16, 12:56 pm 2007 |
| Nick Piggin | Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
It had better not. We use that sucker to nuke pagecache when we're trying to
You can drop the lock, do the invalidation, and return AOP_TRUNCATED_PAGE. The
I just mean more generally. simple write(2) writes, for starters.
For shared writable mmap? I don't know... does POSIX require mmap data
That's what the invalidate / truncate routines do.
--
SUSE Labs, Novell Inc.
-
| May 16, 1:28 pm 2007 |
| David Howells | Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
Hmmm... There's a danger of incurring a race by doing that. Consider two
processes both trying to write to a dirty page for which writeback will be
rejected:
(1) The first process gets EKEYREJECTED from the server, drops its lock and
is then preempted.
(2) The second process gets EKEYREJECTED from the server, drops its lock,
truncates the page, reloads the page and modifies it.
(3) The first process resumes and truncates the page, thereby splatting the
second process's wr...
| May 16, 2:45 pm 2007 |
| David Howells | Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
You can't do write-through caching for shared-writable mmap because the writes
go directly into the pagecache once the page is made writable, at least, short
of instruction emulation.
At some point in the future we'll be asked to turf the data back to the
I suspect so, but I don't know offhand. I want it to be coherent anyway,
otherwise it's inconsistent with OpenAFS and Arla (or at least more so).
Note also that the choice of write-through or write-back caching also has
implications for loc...
| May 16, 1:46 pm 2007 |
| Nick Piggin | Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
So did we just get your issues sorted? I _think_ *snip* is the
Howells code for "OK", but I can never be sure ;)
FWIW, as a rule, ClearPageUptodate should never be done by anyone,
least of all a filesystem on regular file pagecache. I need to go
through and audit this stuff... but so much backlog :P
Anyway, *snip* the side discussion.
--
SUSE Labs, Novell Inc.
-
| May 16, 1:59 pm 2007 |
| Christoph Hellwig | Re: + knfsd-exportfs-add-exportfsh-header-fix.patch added to...
Looks like cifs has grown and export_operations table since I did the
patch. But with only a get_parent method that returns and error it's
not useful at all, so we should rather remove the whole file:
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: linux-2.6/fs/cifs/cifsfs.c
===================================================================
--- linux-2.6.orig/fs/cifs/cifsfs.c 2007-05-16 07:55:35.000000000 +0200
+++ linux-2.6/fs/cifs/cifsfs.c 2007-05-16 07:55:50.000000000 +0200
@...
| May 16, 2:57 am 2007 |
| Steven French | Re: + knfsd-exportfs-add-exportfsh-header-fix.patch added to...
Any ideas what are the minimum export operation(s) that cifs would need to
add to export under nfsd? It was not clear to me after reading the
Exporting document in Documentation directory.
(some users had wanted to export files from Windows servers to nfs clients
files by putting an nfs server mounted over cifs in between - I realize
that this can corrupt data due to nfs client caching etc., as even in some
cases could happen if you try to export a cluster file system under nfsd).
Steve F...
| May 16, 10:55 am 2007 |
| J. Bruce Fields | Re: + knfsd-exportfs-add-exportfsh-header-fix.patch added to...
What cases are you thinking of?
--b.
-
| May 16, 12:02 pm 2007 |
| Steven French | Re: + knfsd-exportfs-add-exportfsh-header-fix.patch added to...
I thought that until a few days ago, a sequence like the following (two
nfs servers exporting the same clustered data)
on client 1 lock range A through B of file1 (exported from nfs server 1)
on client 2 lock range A through C of file 1 (exported from nfs server 2)
on client 1 write A through B
on client 2 write A through C
on client 1 unlock A through B
on client 2 unlock A through C
would corrupt data (theoretically could be fixed as nfsd calls lock
methods
[ message continues ] " title="http://git.kernel.org/?p=linux...">http://git.kernel.org/?p=linux... | May 16, 1:03 pm 2007 |
| J. Bruce Fields | Re: + knfsd-exportfs-add-exportfsh-header-fix.patch added to...
Hm. How could nfsd get stale metadata?
I'm just (probably naively) assuming that a "cluster" filesystem
attempts to provide much higher cache consistency than actually
necessary to keep nfs clients happy. But, if not, it would be nice to
understand the problem.
--b.
-
| May 16, 5:33 pm 2007 |
| Christoph Hellwig | Re: + knfsd-exportfs-add-exportfsh-header-fix.patch added to...
That patch was missing the Makefile hunk, here's the proper one:
(I wish there was a way to avoid having to do quilt add everytime)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: linux-2.6/fs/cifs/cifsfs.c
===================================================================
--- linux-2.6.orig/fs/cifs/cifsfs.c 2007-05-16 07:55:35.000000000 +0200
+++ linux-2.6/fs/cifs/cifsfs.c 2007-05-16 07:55:50.000000000 +0200
@@ -49,10 +49,6 @@
static struct quotactl_ops cifs_quotactl_ops;
#en...
| May 16, 8:53 am 2007 |
| Bill Davidsen | Re: Software raid0 will crash the file-system, when each dis...
If I read this correctly, the problem is with JFS rather than RAID? Have
you tried not mounting the JFS filesystem but just starting the array
which crashes, so you can read bits of it, etc, and verify that the
array itself is working?
And can you run an fsck on the filesystem, if that makes sense? I assume
you got to actually write a f/s at one time, and I've never used JFS
under Linux. I spent five+ years using it on AIX, though, complex but
--
bill davidsen <davidsen@tmr.com>
...
| May 16, 1:28 pm 2007 |
| david | Re: Software raid0 will crash the file-system, when each dis...
he had the same problem with xfs.
David Lang
| May 16, 1:58 pm 2007 |
| Andreas Dilger | Re: Software raid0 will crash the file-system, when each dis...
Check if your kernel has CONFIG_LBD enabled.
The kernel doesn't check if the block layer can actually write to
a block device > 2TB.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
-
| May 16, 10:04 am 2007 |
| david | Re: Software raid0 will crash the file-system, when each dis...
my experiance is taht if you don't have CONFIG_LBD enabled then the kernel
will report the larger disk as 2G and everything will work, you just won't
get all the space.
plus he seems to be crashing around 500G of data
and finally (if I am reading the post correctly) if he configures the
drives as 4x2.2TB=11TB instead of 2x5.5TB=11TB he doesn't have the same
problem.
I'm getting ready to setup a similar machine that will have 3x10TB (3 15
disk arrays with 750G drives), but won't be ready...
| May 16, 2:04 pm 2007 |
| Jeff Zheng | RE: Software raid0 will crash the file-system, when each dis...
You will definitely meet the same problem. As very large hardware disk
becomes more and more popular, this will become a big issue for software
raid.
Jeff
-----Original Message-----
From: david@lang.hm [mailto:david@lang.hm]
Sent: Thursday, 17 May 2007 6:04 a.m.
To: Andreas Dilger
Cc: Jeff Zheng; linux-kernel@vger.kernel.org;
linux-fsdevel@vger.kernel.org
Subject: Re: Software raid0 will crash the file-system, when each disk
is 5TB
my experiance is taht if you don't have CONFIG_LB...
| May 16, 5:44 pm 2007 |
| Jan Engelhardt | Re: Software raid0 will crash the file-system, when each dis...
You could emulate it with VMware. Big disks are quite "cheap" when
they are not allocated.
Jan
--
| May 16, 2:16 pm 2007 |
| Jeff Zheng | RE: Software raid0 will crash the file-system, when each dis...
Problem is that is only happens when you actually write data to the
raid. You need the actual space to reproduce the problem.
Jeff
-----Original Message-----
From: Jan Engelhardt [mailto:jengelh@linux01.gwdg.de]
Sent: Thursday, 17 May 2007 6:17 a.m.
To: david@lang.hm
Cc: Andreas Dilger; Jeff Zheng; linux-kernel@vger.kernel.org;
linux-fsdevel@vger.kernel.org
Subject: Re: Software raid0 will crash the file-system, when each disk
is 5TB
few more days.
You could emulate it with VMware. B...
| May 16, 5:42 pm 2007 |
| Dave Kleikamp | Re: [PATCH 1/5][TAKE3] fallocate() implementation on i86, x8...
i_blocks will be updated, so it seems reasonable to update ctime. mtime
shouldn't be changed, though, since the contents of the file will be
--
David Kleikamp
IBM Linux Technology Center
-
| May 16, 8:21 am 2007 |
| David Chinner | Re: [PATCH 1/5][TAKE3] fallocate() implementation on i86, x8...
That's assuming blocks were actually allocated - if the prealloc range already
has underlying blocks there is no change and so we should not be changing
mtime either. Only the filesystem will know if it has changed the file, so I
think that timestamp updates need to be driven down to that level, not done
blindy at the highest layer....
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
| May 16, 7:40 pm 2007 |
| Amit K. Arora | Re: [PATCH 1/5][TAKE3] fallocate() implementation on i86, x8...
I agree. Thus the ctime should change for FA_PREALLOCATE mode also
(which does not change the file size) - if we end up having this
additional mode in near future.
--
Regards,
-
| May 16, 8:37 am 2007 |
| Amit K. Arora | Re: [PATCH 1/5][TAKE3] fallocate() implementation on i86, x8...
I think ->fallocate() should return a "long", since sys_fallocate() has
to return what ->fallocate() returns and hence their return type should
I will change the ext4_fallocate() to return a "long" (in patch 4/5)
in the next post.
Agree ?
Thanks!
--
Regards,
| May 16, 8:31 am 2007 |
| Christoph Hellwig | Re: [PATCH 2/2] AFS: Implement shared-writable mmap
It looks like you really want Dave's generic page_mkwrite. And we should
really get that one merged for 2.6.22.
-
| May 16, 3:10 am 2007 |
| Nick Piggin | Re: [PATCH 2/2] AFS: Implement shared-writable mmap
Dave asked me about that the other day (in relation to the ->fault ops)...
I have no problem merging it for 2.6.22 and rebasing my patches on top.
--
SUSE Labs, Novell Inc.
-
| May 16, 3:17 am 2007 |
| Hugh Dickins | Re: [PATCH 2/2] AFS: Implement shared-writable mmap
That's right, the overhead of the lock_page()/unlock_page() in the common
path of faulting, and of the extra call to unmap_mapping_range() when
truncating (because page lock doesn't satisfactorily replace the old
So far, yes. I expect it'll surface in some reallife workload
sometime, but let's not get too depressed about that. I guess
It is a pity to be adding overhead to a common path in order to fix
Again, rather too blithely said. You have a deep well of ingenuity,
Getting a "yes" or ...
| May 16, 12:36 pm 2007 |
| Nick Piggin | Re: [PATCH 2/2] AFS: Implement shared-writable mmap
I say I believe scalability will not be a huge issue, because for
concurrent faulters on the same page, they still have cacheline
contention beginning before we lock the page (tree_lock), and
ending after we unlock it (page->_count), and a few others in the
middle for good mesure. I sure don't think it is going to help,
but I don't think it would be a great impact on an alrady sucky
workload.
We would have contention against other sources of lock_page, but
OTOH, we want to fix up the clear_...
| May 16, 1:14 pm 2007 |
| Hugh Dickins | Re: [PATCH 2/2] AFS: Implement shared-writable mmap
I'm hoping you intended one less negative ;)
-
| May 16, 1:26 pm 2007 |
| David Howells | Re: [PATCH 2/2] AFS: Implement shared-writable mmap
Or did you mean one fewer negative? :-)
David
-
| May 16, 1:48 pm 2007 |
| Chuck Ebbert | Re: [PATCH 2/2] AFS: Implement shared-writable mmap
Or one more...
-
| May 16, 1:34 pm 2007 |
| Nick Piggin | Re: [PATCH 2/2] AFS: Implement shared-writable mmap
Derrr... I'm an idiot!
--
-
| May 16, 1:30 pm 2007 |
| Pavel Machek | Re: [PATCH] LogFS take three
Hi!
Please just delete it, not comment it out like this.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
| May 16, 3:17 pm 2007 |
| Jörn | Re: [PATCH] LogFS take three
I know. Top 3 items of my todo list are:
- Handle system crashes
- Add second journal
That will get resurrected, even before the move to userspace. I had to
change the filesystem format for compression support and this is an
artifact of the transition phase.
J=C3=B6rn
--=20
Ninety percent of everything is crap.
-- Sturgeon's Law
-
| May 16, 3:23 pm 2007 |
| Pekka J Enberg | Re: [PATCH] LogFS take three
Note that BUG() can be a no-op so dumping something on disk might not make
sense there. This seems useful, but you probably need to make this bit
more generic so that using BUG() proper in your filesystem code does the
Please drop this wrapper function. It's better to open-code the error
This looks fishy. All reads and writes are serialized by compr_mutex
Seems wasteful to first read the data in a scratch buffer and then
memcpy() it immediately for the COMPR_NONE case. Any reason why ...
| May 16, 6:21 am 2007 |
| Jörn | Re: [PATCH] LogFS take three
Hmm. I am not sure how this could be generic and still make sense.
LogFS has some unused write-once space in the superblock segment.
Embedded people always have problems debugging and were suggesting usin=
g
this to store debug information. That allows me to ask for a filesyste=
m
image and get both the offending image plus a crash dump. It also
allows me to abort mounting if I ever see an existing crash dump (not
implemented yet). "First failure data capture" was an old IBM slogal
and the ...
| May 16, 8:26 am 2007 |
| previous day | today | next day |
|---|---|---|
| May 15, 2007 | May 16, 2007 | May 17, 2007 |
| Andrew Morton | Re: Linux 2.6.21-rc4 |
| Len Brown | [PATCH 01/85] ACPI: Add missing acpi.debug_layer Documentation hunk from Thomas Re... |
| David | sdhci: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter - disabling IRQ |
| Olaf van der Spek | Unix sockets via TCP on localhost: is TCP slower? |
git: | |
| Andy Parkins | svn:externals using git submodules |
| Peter Karlsson | CRLF problems with Git on Win32 |
| Denis Bueno | Git clone error |
| Michael S. Tsirkin | git-kill: rewrite history removing a commit |
| Martín Coco | Hardware recommendation for firewalls (more than 4 NICs) |
| C. Soragan Ong | OpenBSD 4.4 amd64 bsd.mp can't detect 4GB memory |
| Richard Stallman | Real men don't attack straw men |
| Chris | OpenBSD 4.4 installation error: write failed; file system full |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Wenji Wu | A Linux TCP SACK Question |
| Dushan Tcholich | Re: ksoftirqd high cpu load on kernels 2.6.24 to 2.6.27-rc1-mm1 |
| Hannes Eder | [PATCH 19/27] drivers/net/usb: fix sparse warnings: make symbols static |
