Hi, As far as I know about the details of implementation and what would it take to fix the problems, is it safe to assume ZFS will never become stable during 7.x lifetime?
Have you heard of the logical fallacy called "plurium interrogationum"? You may not be familiar with the phrase (which is Latin for "multiple questions"), but it's what you're doing here: asking a question which is impossible to answer truthfully because it is based on an incorrect premise, and to answer the question correctly you must first discuss the premise. It's a favorite Hollywood plot device, because you can have the smart-aleck lawyer interrupt the confused witness and insist on a yes or no answer, forcing the witness to implicitly agree with the premise. I doubt it would work in a real-life court, though, because judges tend to be smart people. But I digress. Your question is based on the premise that ZFS in FreeBSD 7 is unstable. That premise is false. There are issues with auto-tuning of certain parameters, which can cause kmem exhaustion, but they are easily worked around by setting a few tunables. It has worked very well for me (raidz, 1.2 TB pool, 4 GB RAM, ~60 file systems and twice as many snapshots) after I added the following lines to loader.conf: vm.kmem_size="1G" vfs.zfs.arc_min="64M" vfs.zfs.arc_max="512M" DES -- Dag-Erling Smørgrav - des@des.no _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
T24gMDcvMDEvMjAwOCwgRGFnLUVybGluZyBTbcO4cmdyYXYgPGRlc0BkZXMubm8+IHdyb3RlOgoK PiBZb3VyIHF1ZXN0aW9uIGlzIGJhc2VkIG9uIHRoZSBwcmVtaXNlIHRoYXQgWkZTIGluIEZyZWVC U0QgNyBpcyB1bnN0YWJsZS4KPiBUaGF0IHByZW1pc2UgaXMgZmFsc2UuCgpBdCBtb3N0LCB3ZSds bCBoYXZlIHRvIGFncmVlIHRvIGRpc2FncmVlLiBBICJ0dW5pbmciIG9mIHRoZSBzeXN0ZW0gKGF0 CmxlYXN0IGZyb20gbXkgZXhwZXJpZW5jZSkgaXMgYWJvdXQgc3lzdGVtIHBlcmZvcm1hbmNlLCBu b3Qgd2hldGhlciB0aGUKc3lzdGVtIHdpbGwgY3Jhc2ggb3Igbm90LiBZb3UgbWF5IGRlZmluZSB0 aGUgd29yZCB0byBtZWFuIHNvbWV0aGluZwplbHNlIGJ1dCB0aGF0J3MgeW91ciB0aGluZy4KClRo ZSByZWFzb24gSSdtIGFnZ3Jlc3NpdmVseSBkaXNjdXNzaW5nIHRoaXMgaXMgdGhhdCBsYWJlbGlu ZyB0aGUKcHJvYmxlbSBhcyAidHVuaW5nIiB3aWxsLCBmb3IgYW55IG5vbi10cml2aWFsIHRhc2sg d2hpY2ggaGFzIHNvbWUKZ3Jvd3RoIGluIHN5c3RlbSBsb2FkLCByZXN1bHQgaW4gYSBzZXJ2ZXIg dGhhdCBuZWVkcyBjb25zdGFudCB0dW5pbmcKanVzdCB0byBzdXJ2aXZlIGFub3RoZXIgZGF5LiBX aGF0IGlzIHR1bmVkIHRvZGF5IG1heSBhcyB3ZWxsIHJlc3VsdCBpbgphIGNyYXNoIHRvbW9ycm93 IGlmIHRoZSBsb2FkIHJpc2VzLiBXZWIgc2VydmVycyBhcmUgbm90b3Jpb3VzIGZvciB0aGlzCih0 aG91Z2ggb3RoZXIgdHlwZXMgaGF2ZSBvZiBjb3Vyc2Ugc2ltaWxhciBiZWhhdmlvdXIpIC0gYQoi c2xhc2hkb3R0aW5nIiBvZiBhICJwcm9wZXJseSB0dW5lZCIgRnJlZUJTRCBzeXN0ZW0gd2l0aCBa RlMgd2lsbCBub3QKcmVzdWx0IGluIGEgc2xvd2Rvd24gLSBpdCB3aWxsIHJlc3VsdCBpbiB0aGUg c3lzdGVtIGNyYXNoaW5nLiBUaGlzIGlzCm5vdCBhY2NlcHRhYmxlLCBhbmQgdGhlcmVmb3JlIGRp c21pc3NpbmcgaXQgYXMgImp1c3QgdHVuaW5nIiBpcwpjb3VudGVycHJvZHVjdGl2ZSBhbmQgYmFk IGVuZ2luZWVyaW5nLgo=
ZFS is clearly marked as experimental so its reasonable to require tuning to avoid crashes. If its still the case when the experimental status is lifted then you can have this argument all over again. cheers, Andrew _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To sum up this thread, let me present ZFS status as of today. Before I do that, one explanation. I was away from FreeBSD for like 3-4 weeks, because of real life issues, etc. I hope, I'm now back for good. Let me also use this again to invite any interested committers to help working on ZFS (I'm inviting people to help from a day one). Ok. The most pressing issues currently are: 1. kmem_map exhaustion. 2. Low memory deadlocks in ZFS itself. I believe 2nd problem is already fixed in OpenSolaris, at least that was my impression when I made last integration, I need to double check. If that's true, I'll try to commit the fix before 7.0-RELEASE. The 1st problem has of course much wider audience. First of all you need: http://people.freebsd.org/~pjd/patches/vm_kern.c.2.patch The patch is not yet committed, because I was discussing better solutions with alc@. I don't think we (he) will be able to come up with something better before 7.0-RELEASE, so I'm going to ask re@ for approval for this patch today. Note that it is low risk change, because it is executed only in situation where the system will panic anyway. Of course it is so much better to use ZFS on 64bit systems, but it also works on i386. I'm running ZFS in production for many months on two i386 systems. One has 1GB memory and those tunning in loader.conf: vfs.zfs.prefetch_disable=3D1 vm.kmem_size=3D671088640 vm.kmem_size_max=3D671088640 I've three ZFS pools in here, no UFS at all. The load is rather light, serving large files. No panics. The second "production" box is my laptop. I've 2GB of RAM (it worked fine with 1GB too), but I do have 'options KVA_PAGES=3D512' in my kernel config and my loader.conf looks like this: vm.kmem_size=3D1073741824 vm.kmem_size_max=3D1073741824 vfs.zfs.prefetch_disable=3D1 My laptop is ZFS-only. No panics whatsoever. The box I'm running ZFS for the longest time is amd64 system with 1GB of RAM. This box is used for backups (ZFS snapshots are so damn handy) and guess...
I'd suggest we do give all three warnings (KVA_PAGES, kmem_size, i386) at once, preferably both when the ZFS module loads and when a zpool is created. I think it's important that the tree pieces of information be given at the same time so the user doesn't need to hunt solutions after panics. Your comment that people are panicking more than ZFS is correct, but that illustrates the importance people give to having file system not crash on them :) _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Ivan Voras wrote: > Pawel Jakub Dawidek wrote: > > > Let try to think how we can warn people clearly about proper tunning and > > what proper tunning actually means. I think we should advise increasing > > KVA_PAGES on i386 and not only vm.kmem_size. We could also warn that > > running ZFS on 32bit systems is not generally recommended. Any other > > suggestions? > > I'd suggest we do give all three warnings (KVA_PAGES, kmem_size, i386) > at once, preferably both when the ZFS module loads and when a zpool is > created. I think it's important that the tree pieces of information be > given at the same time so the user doesn't need to hunt solutions > after panics. How about including the URL of the ZFS tuning guide in the warning message: http://wiki.freebsd.org/ZFSTuningGuide It contains all the necessary information for both i386 and amd64 machines. It can also easily be updated if necessary so people always get the most up-to-date information. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "Documentation is like sex; when it's good, it's very, very good, and when it's bad, it's better than nothing." -- Dick Brandon _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Pawel said in Nov: The Wiki should be changed. Allow ZFS to autotune it, don't tune it by hand. ----- Yet the wiki still recommends hand tuning? Cheers. -- Mark Powell - UNIX System Administrator - The University of Salford Information Services Division, Clifford Whitworth Building, Salford University, Manchester, M5 4WT, UK. Tel: +44 161 295 6843 Fax: +44 161 295 5888 www.pgp.com for PGP key _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Actually, it fails to mention the most important bit: vfs.zfs.arc_max, which allows you to restrict the amount of memory used by ZFS to something comfortably smaller than vm.kmem_size. DES -- Dag-Erling Smørgrav - des@des.no _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
It was in the ZFS tunning guide, but was removed in revision 20. Doesn't say why the change was made. Scot _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
T24gMDgvMDEvMjAwOCwgRGFnLUVybGluZyBTbcO4cmdyYXYgPGRlc0BkZXMubm8+IHdyb3RlOgoK PiBBY3R1YWxseSwgaXQgZmFpbHMgdG8gbWVudGlvbiB0aGUgbW9zdCBpbXBvcnRhbnQgYml0OiB2 ZnMuemZzLmFyY19tYXgsCj4gd2hpY2ggYWxsb3dzIHlvdSB0byByZXN0cmljdCB0aGUgYW1vdW50 IG9mIG1lbW9yeSB1c2VkIGJ5IFpGUyB0bwo+IHNvbWV0aGluZyBjb21mb3J0YWJseSBzbWFsbGVy IHRoYW4gdm0ua21lbV9zaXplLgoKUGF3ZWwsIGlzIGl0IHJlY29tbWVuZGVkPwoKSWYgaXQgaXMs IEknbGwgYWRkIGl0IHRvIHRoZSBwYWdlLgo=
With the vm_kern.c.2.patch, it doesn't seem to be an issue, at least for me. "c" always stays far away from "c_max": kstat.zfs.misc.arcstats.p: 218885440 kstat.zfs.misc.arcstats.c: 342346436 kstat.zfs.misc.arcstats.c_min: 20971520 kstat.zfs.misc.arcstats.c_max: 503316480 kstat.zfs.misc.arcstats.size: 342342144 vm.kmem_size: 671088640 hw.physmem: 1064771584 vm.kmem_map_panics_avoided: 171 The last sysctl was added by me to track how often the patch saved my system from a panic :) I suppose lowering arc_max would reduce the number of times the routine was called, though. -- Dan Nelson dnelson@allantgroup.com _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
The tuning information belongs in the zfs(8) manual page. -- Steve _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Having read the thread and people's reasons for using the ZFS, it does seem that they are trying to use ZFS to solve non-problem problems, especially that someone commented that they use 1:10 kmem:HD space ratio! Igor :-) _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
I'm not sure if anyone has mentioned this yet in the thread, but another thing worth taking into account in considering the stability of ZFS is whether or not Sun considers it a production feature in Solaris. Last I heard, it was still considered an experimental feature there as well. Robert N M Watson Computer Laboratory University of Cambridge _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Last I heard, rsync didn't crash Solaris on ZFS :)
I can't provide citation about a thing that doesn't happen - you don't=20 hear things like "oh and yesterday I ran rsync on my Solaris with ZFS=20 and *it didn't crash*!" often. But, with some grains of salt taken, consider this Google results: * searching for "rsync crash solaris zfs": 790 results, most of them=20 obviously irrelevant * searching for "rsync crash freebsd zfs": 10,800 results; a small=20 number of the results is from this thread, some are duplicates, but it's = a large number in any case. I feel that the number of Solaris+ZFS installations worldwide is larger=20 than that of FreeBSD+ZFS and they've had ZFS longer.
I used zfs on FreeBSD current amd64 around summer 2006 as a samba-server for internal use on a dual xeon (first generation 64-bit, somewhat slow and hot) with 4 GB ram and two qlogic hba's attached to approx. 8 TB of storage. I did not once experience any kernel panic or other unplanned stop. But I whenever I manually mounted a smbfs-share the terminal would not return to the command line. I upgraded in october 2007 and the smbfs-mount returned to the command line and I thought I was happy. Until I started to get the kmem_map too small kernel-panics when doing much I/O (syncing 40 GB of small files). I tuned the values as indicated in the zfs tuning guide and rebooted and increased the values as the kernel panics persisted. When I increased the values even more I ended up with a kernel which refused to boot, boy I was almost getting a panic myself :-) Applying Pawel's patch did make the server survive two or three 40 GB rsyncing so the patch did help. But we were approching xmas season which is a very critical time for us so I migrated to solaris 10. The solaris server has had no downtime but to conclude that solaris is more stable in my situation is premature. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Almost all Solaris systems are 64 bit. Kris _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
So, let's be honest here. ZFS is simply unreliable on FreeBSD/i386. There are things that you can do mitigate the problems, and in certain well controlled environments you might be able to make it work well enough for your needs. But as a general rule, don't expect it to work reliably, period. This is backed up by Sun's own recommendation to not run it on 32-bit Solaris. But let's also be honest about ZFS in the 64-bit world. There is ample evidence that ZFS basically wants to grow unbounded in proportion to the workload that you give it. Indeed, even Sun recommends basically throwing more RAM at most problems. Again, tuning is often needed, and I think it's fair to say that it can't be expected to work on arbitrary workloads out of the box. Now, what about the other problems that have been reported in this thread by Ivan and others? I don't think that it can be said that the only problem that ZFS has is with memory. Unfortunately, it looks like these "other" problems aren't well quantified, so I think that they are being unfairly dismissed. But at the same time, maybe these other problems are rare and unique enough that they represent very special cases that won't be encountered by most people. But it also tells me that ZFS is still immature, at least in FreeBSD. The universal need for tuning combined with the poorly understood problem reports tells me that administrators considering ZFS should expect to spend a fair amount of timing testing and tuning. Don't expect it to work out of the box for your situation. That's not to say that it's useless; there are certainly many people who can attest to it working well for them. Just be prepared to spend time and possibly money making it work, and be willing to provide good problem reports for any non-memory related problems that you encounter. Scott _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send...
JFWIW - last night's trial OpenSolaris/Indiana' devel iso installed on Core-2 duo with 2GB created something it reported as 'Z-lite) (IIRC - it wasn;t worht wasting HDD space on...) Anyone know if this 'different' on Solaris for i386 from -64? i.e. - is do Sun use a 'lite' and full' version? And, if so, [is there | should there be ] an equivalent in the FreeBSd world? or Clearly so. So much so that IMNSHO, inclusion of most *remaining* ZFS issues more properly belongs on the ZFS-specific mailing list. I don't see much - if any - remaining evidence that there are things either 'wrong' or even sub-optimal with FreeBSD *itself* that only ZFS exposes. Au contraire - FreeBSD seems to be as accommodating to ZFS needs as can be. The rest seems to be up to ZFS code, 'sensing' of resources & load, manual & auto-config, dynamic adjustment - more graceful degradation & recovery. Whatever. JM2CW, but the level of 'traffic' on this list in re still-experimental-at-best ZFS is distracting attention from issues that are more universal, critical to more users and uses - and more in need of scarce attention 'Real Soon Now'. It almost begs dismissal of ZFS posts to the bespoke list out-of-hand. ZFS is still eminently 'avoidable' for now. Reports of I/O problems, drivers that can corrupt data on *UFS* are a whole 'nuther matter.. Bill _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
For my part it's because I'm "desperate" for a good file system, and ZFS = seemed to be "it" for a while. I'd also settle for any other, including=20 a stable version of UFS that's pleasant to work with on TB-sized drives=20 (Sun's UFS? BLUFFS?), XFS, Ext4, LFS, HAMMER, whatever. I've tried contacting the author of BLUFFS, but without optimistic result= s.
None are perfect. But ZFS is just *too* new. And not just on *BSD. If IBM had not already had GPFS, Sun might never even have 'invented' ZFS. The 'other' ones with the longest 'history' - where known-problems have knwon avoidance/workaround, may well be XFS and JFS. Heavy-lifters iwht commercial track-records, both. Not to mention UFS... I'm still in the practice of 'slicing' into 50 GB or so - 100GB max - no matter *what* the drive size is. So where's the 'beef'? Half-terabyte *files*? I surely hope not.. At some point too many eggs (files) in one basket just makes b/u restore a nightmare. There are no silver bullets. Drives fail. Controllers fail, and sometimes had done so long before anyone noticed they were subtly corrupting data. So even RAID arrays and offline b/u can fail one.. ZFS doesn't 'fix' all that - just approaches a fix in an all-software manner. Other failings aside, there is an overhead penalty for all the 'handling'. Coders may believe in that. It's what they do. I'll take simplicity, redundant hardware. And compartmentalization. Faster, cheaper, lasts a long time. And takes more manageable sized chunks out of yerass when it DOES go tits-up. As that all do. Bill _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
On Sun, 2008-01-06 at 22:58 +0000, 韓家標 Bill Hacker wrote: Could you by any chance elaborate -- from the information available to me, I did not get an impression that ZFS is the cluster-aware filesystem OT: As someone, who has ~10TB of compressed high-fidelity documents in production (AIX/JFS2), I can tell you that this approach will only take Not any better then 200 x 50GB filesystems ;) -- Alexandre "Sunny" Kovalenko _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
From the Wikipedia article on Lustre... "...Sun completed its acquisition of Cluster File Systems, Inc., including the Lustre file system, on October 2, 2007, with the intention of bringing the benefits of Lustre technologies to Sun's ZFS file system and the Solaris operating system." So Sun has had what? 2+ months? to try to fill a ZFS 'hole' that was worth a major investment? See also traffic on *Sun's* ZFS list. Far more features than that - 'robust', 'fault tolerant', 'Disaster Recovery' ... all the usual buzzwords. And nothing prevents using 'cluster' tools on a single box. Not storage-wise anyway. More importantly - GPFS has just under ten years in the market, and has become a primary player in Supercomputing as well as video on demand et al. BTW: UFS(1) / FFS - have very respectable upper-bounds - UFS2 even more so, so (even) Sun is not totally dependent on ZFS. Unless they choose to become so... Finally - the principle architect/miracle worker of ZFS on FreeBSD - pjd@ - seems to be heavily committed on other matters now, and may be so for some time to come. Ergo 'caution' remains appropriate for production use w/r ZFS - perhaps until 8.X. Bill _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
=?UTF-8?B?6Z+T5a625qiZIEJpbGwgSGFja2Vy?= writes: > > OTOH that's all GPFS is. > > Far more features than that - 'robust', 'fault tolerant', 'Disaster Recovery' > ... all the usual buzzwords. > > And nothing prevents using 'cluster' tools on a single box. Not storage-wise anyway. Having had the misfortune of being involved in a cluster which used GPFS, I can attest that GPFS is anything but "robust" and "fault tolerant" in my experience. Granted this was a few years ago, and things may have improved, but that one horrible experience was sufficient to make me avoid GPFS for life. Drew _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Would you mind sharing your experience, maybe in the private E-mail. I am especially interested in the platform you have used (as in AIX or Linux) and underlying storage configuration (as in directly attached vs. separate file system servers). I am running few small AIX clusters in the lab using GPFS 3.1 over iSCSI and so far was fairly pleased with that. However, OP's point was that ZFS has inherent cluster abilities, of -- Alexandre "Sunny" Kovalenko _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
"Alexandre \"Sunny\" Kovalenko" writes: > > On Wed, 2008-01-09 at 08:23 -0500, Andrew Gallatin wrote: > > =?UTF-8?B?6Z+T5a625qiZIEJpbGwgSGFja2Vy?= writes: > > > > OTOH that's all GPFS is. > > > > > > Far more features than that - 'robust', 'fault tolerant', 'Disaster Recovery' > > > ... all the usual buzzwords. > > > > > > And nothing prevents using 'cluster' tools on a single box. Not storage-wise anyway. > > > > Having had the misfortune of being involved in a cluster which used > > GPFS, I can attest that GPFS is anything but "robust" and "fault > > tolerant" in my experience. Granted this was a few years ago, and > > things may have improved, but that one horrible experience was > > sufficient to make me avoid GPFS for life. > Would you mind sharing your experience, maybe in the private E-mail. I > am especially interested in the platform you have used (as in AIX or > Linux) and underlying storage configuration (as in directly attached vs. > separate file system servers). > > I am running few small AIX clusters in the lab using GPFS 3.1 over iSCSI > and so far was fairly pleased with that. Linux, with GPFS 1.x over ethernet. If there was even the slightest load on the ethernet network, and a GPFS heartbeat message got lost, the entire FS would die. That did not meet my definition of robust :(. Note that this was nearly 4 years ago, so it has likely gotten better. > However, OP's point was that ZFS has inherent cluster abilities, of > which I have found no information whatsoever. Indeed, but I do remember hearing the Lustre/ZFS rumors. Drew _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
I agree. I build a new backup server ( DualCore processor, AMD64 Kernel, 3GB Ram, 10x 500GB Sata disks, areca 1230) to receive data from all my local servers on a 4TB zfs pool (using compression, ~ 300 snapshots and ~90 filesystems) and after write to LTO3 Tape Drives. It worked fine after the required tuning (vm patch, prefetch disable, etc) But I lost my data two times. The first was in 11/12/2007, the system freeze and after reboot I get the panic when trying to mount the zfs pool: Dump header from device /dev/ad0s1b Architecture: amd64 Architecture Version: 2 Dump Length: 103477248B (98 MB) Blocksize: 512 Dumptime: Mon Nov 12 14:56:12 2007 Hostname: Magic: FreeBSD Kernel Dump Version String: FreeBSD 7.0-BETA2 #0: Mon Nov 12 11:49:07 BRST 2007 root@:/usr/src/sys/amd64/compile/MANNY.debug Panic String: solaris assert: ss == NULL, file: /usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/space_map.c, line: 110 Dump Parity: 2217569595 Bounds: 3 Dump Status: good after some days giving some shots with Pawel (and his contact with solaris people), we can't figure out the problem, I assume the lost and recreate the zpool. I decided to give another try, put more memory, do more "tuning" and after one month all worked fine except the slowness when coping small files to a tape drive (a started a new thread about that on -performance http://www.mail-archive.com/freebsd-performance@freebsd.org/msg01764.html) when I get another crash, this time with: ZFS(panic): zfs: allocating allocated segment(offset=2781261201408 size=131072) And again, I can't recover my zpool. I had choose zfs because the fantastic features available, instant snapshots, clones, native/transparent compression, the way that you can create filesystems inside the pool limiting and reserving space, all this make my backup solution simple amazing. But this crashes forced me to step back and without a filesystem that can handle TB without tedious fsck a had to ...
To be clear, in this thread I have been mostly restricting myself to discussion of kmem problems only, although I have also noted that there are known ZFS bugs including bugs that are unfixed even in solaris (the ZIL low memory deadlock is one of them). Indeed, pjd has a long list of bug reports from me :) I agree with the rest of this summary. Kris _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
I guess what makes me mad about ZFS is that it's all-or-nothing; either it works, or it crashes. It doesn't automatically recognize limits and make adjustments or sacrifices when it reaches those limits, it just crashes. Wanting multiple gigabytes of RAM for caching in order to optimize performance is great, but crashing when it doesn't get those multiple gigabytes of RAM is not so great, and it leaves a bad taste in my mouth about ZFS in general. Scott _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
I agree with the sentiment. I don't know about its implementation, but surely some kind of backout could have be implemented? I'm just guessing here: maybe the problem is in M_NOWAIT - maybe there could be a M_NOWAIT_BUT_ALLOW_NULL that would be safe to use in non-sleepable code but could return NULL, which could be tested and the whole file system request postponed... _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Um, I don't think this part of the post means what I wanted it to mean - = please ignore it - ETOOTIRED :)
Scott Long wrote: To be fair - every fs on the planet had to go through this at one time or another. We have been perhaps 'spoiled' by the odd runaway log or such that has pushed UFS to over 103% 'full', struggled on regardless, allowing us to ssh in from 12,000 miles away, kill the offender, clean up the mess, and soldier-on w/o even a reboot, let alone a crash. ZFS will (probably) get there one day as well. But at present, it has become a distraction we don't need. We're chasing promises... I'd happily trade all future interest in ZFS for better ufs, nfs, smbfs, ntfs, xfs, jfs, et al performance/safety/compatibility, ... if only 'coz that's where the bulk of the data we need to 'talk to' actually resides - not on ZFS or GPFS. Bill _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
My admittedly second-hand understanding is that ZFS shows similarly gratuitous memory use on both Mac OS X and Solaris. One advantage Solaris has is that it runs primarily on expensive 64-bit servers with lots of memory. Part of the problem on FreeBSD is that people run ZFS on sytems with 32-bit CPUs and a lot less memory. It could be that ZFS should be enforcing higher minimum hardware requirements to mount (i.e., refusing to run on systems with 32-bit address spaces or <4gb of memory and inadequate tuning). Robert N M Watson Computer Laboratory University of Cambridge _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Before ZFS was released, I was using it internally on a 32bit desktop. It never panic'd although it did get very slow after a while because of the way it managed memory (and probably some bugs :) while in early alpha/beta. At work I run it on my Ultra20 desktop with Solaris 10. It has an AMD64 CPU and I'm pretty only 2GB of RAM, but I'll have to check on the RAM. Darren _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Solaris nowadays refuses to install on anything without at least 1 GB of = memory. I'm all for ZFS refusing to run on inadequatly tuned hardware,=20 but apparently there's no algorithmic way to say what *is* adequately=20 tuned, except for "try X and if it crashes, try Y, repeat as necessary". The reason why I'm arguing this topic is that it isn't a matter of=20 tuning like "it will run slowly if you don't tune it" - it's more like=20 "it won't run at all if you don't go through the laborious=20 trial-and-error process of tuning it, including patching your kernel and = running a non-GENERIC configuration".
What you appear to be still missing is that ZFS also causes memory exhaustion panics when run on 32-bit Solaris. In fact (unless they have since fixed it), the opensolaris ZFS code makes *absolutely no attempt* to accomodate i386 memory limitations at all. Kris _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Citation needed. I'm interested.
Reports on the zfs-discuss mailing list. Kris _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Thanks for the pointer. I'm looking at the archives. So far I've found this:=20 http://www.archivum.info/zfs-discuss@opensolaris.org/2007-07/msg00016.htm= l=20 which doesn't mention panics; and this:=20 http://www.archivum.info/zfs-discuss@opensolaris.org/2007-07/msg00054.htm= l=20 which didn't get any replies but the backtrace doesn't include anything=20 resembling a malloc-like call.
I suppose that depends what you mean by stable. It seems stable enough for a number of applications today. It's clearly not widely tested since we haven't shipped a release based on it. It's possible some of the issues of memory requirements won't be fixable in 7.x, but I don't think that's a given. -- Brooks
My yardstick is currently "when a month goes by without anyone This number is not so large. It seems to be easily crashed by rsync, for example (speaking from my own experience, and also some of my I listened to some of Pawel's talks and devsummit brainstormings and I get the feeling *none* of the problems can be fixed in 7.x, especially on i386. I'm just asking for more official confirmation. This is not a trivial question, since it involves deploying systems to be maintained some years into the future - if ZFS will become stable relatively shortly, it might be worth putting up with crashes, but if not, there will be no near-future deployments of it. _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
I can definitely say this is not *generally* true, as I do a lot of=20 rsyncing/rdiff-backup:ing and similar stuff (with many files / large files)= =20 on ZFS without any stability issues. Problems for me have been limited to=20 32bit and the memory exhaustion issue rather than "hard" issues. But perhaps that's all you are referring to. =2D-=20 / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org
It's not generally true since kmem problems with rsync are often hard to repeat - I have them on one machine, but not on another, similar Mostly. I did have a ZFS crash with rsync that wasn't kmem related, but only once. _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
kmem problems are just tuning. They are not indicative of stability problems in ZFS. Please report any further non-kmem panics you experience. Kris _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
I encounter 2 times a deadlock during high I/O activity (the last one during rsync + rm -r on a 5GB hierarchy (openoffice-2/work). I was running with this patch: http://people.freebsd.org/~pjd/patches/zgd_done.patch db> show allpcpu Current CPU: 1 cpuid = 0 curthread = 0xa5ebe440: pid 3422 "txg_thread_enter" curpcb = 0xeb175d90 fpcurthread = none idlethread = 0xa5529aa0: pid 12 "idle: cpu0" APIC ID = 0 currentldt = 0x50 cpuid = 1 curthread = 0xa56ab220: pid 47 "arc_reclaim_thread" curpcb = 0xe6837d90 fpcurthread = none idlethread = 0xa5529880: pid 11 "idle: cpu1" APIC ID = 1 currentldt = 0x50 _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Backtraces of the affected processes (or just alltrace) are usually required to proceed with debugging, and lock status is also often vital (show alllocks, requires witness). Also, in the case when threads are actually running (not deadlocked), then it is often useful to repeatedly break/continue and sample many backtraces to try and determine where the threads are looping. Kris _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
I add it to my kernel config I do this after the second deadlock and arc_reclaim_thread was always there and second cpu was idle. _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To repeat, it is important not just to note which thread is running, but *what the thread is doing*. This means repeatedly comparing the backtraces, which will allow you to build up a picture of which part of the code it is looping in. Kris _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
I agree that ZFS is pretty stable itself. I use 32bit machine with 2gigs od RAM and all hang cases are kmem related, but the fact is that I haven't found any way of tuning to stop it crashing. When I do some rsyncing, especially beetwen different pools - it hangs or reboots - mostly on bigger files (i.e. rsyncing ports tree with distfiles). At the moment I patched the kernel with vm_kern.c.2.patch and it just stopped crashing, but from time to time the machine looks like beeing freezed for a second or two, after that it works normally. Have you got any similar experience? -- regards, Maciej Suszko. _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
That is expected. That patch makes the system do more work to try and reclaim memory when it would previously have panicked from lack of memory. However, the same advice applies as to Ivan: you should try and tune the memory parameters better to avoid this last-ditch sitation. Kris P.S. It sounds like you do not have sufficient debugging configured either: crashes should produce either a DDB prompt or a coredump so they can be studied and understood. _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
As Ivan said - tuning kmem_size only delay the moment system crash, You're right - I turned debugging off, because it's not a production machine and I can afford such behaviour. Right now, using kernel with kmem patch applied it's ,,usable''. -- regards, Maciej Suszko. _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
So the same question applies: exactly what steps did you take to tune the memory parameters? Extracting this information from you guys shouldn't be as hard as this :) Kris _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
I was playing around with kmem_max_size mainly. I suppose messing up with KVA_PAGES is not a good idea unless you exactly know how much memory you software consume... -- regards, Maciej Suszko. _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
I disagree - anything that causes a panic is a stability problem. Panics = persist AFTER the tunings (for i386 certainly, and there are unsolved=20 reports about it on amd64 also) and are present even when driving kmem=20 size to the maximum. The tunings *can not solve the problems* currently, = they can only delay the time until they appear, which, by Murphy, often=20 means "sometime around midnight at Saturday". See also the possibility=20 I did, once to Pawel and once to the lists. Pawel couldn't help me and=20 nobody responded on the lists. Can you perform a MySQL read-write=20 benchmark on one of the 8-core machines with database on ZFS for about=20 an hour without pause? On a machine with 2 GB (or less) of RAM,=20 preferrably? I've seen problems on i386 but maybe they are also present=20 on amd64.
That's an assertion directly contradicted by my experience running a heavily loaded 8-core i386 package builder. Please explain in detail the steps you have taken to tune your kernel. Do you have the vm_kern.c patch applied? > See also the possibility > of deadlocks in the ZIL, reported by some users. Yes, this is an outstanding issue. There are a couple of others I run I am not set up to test this right now. Kris _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
What is the IO profile of this usage? I'd guess that it's "short bursts of high activity (archive extraction, installing) followed by long periods of low activity (compiling)". From what I see on the lists and somewhat from my own experience, the problem appears more often when the load is more like "constant high r+w activity", probably with several users (applications) doing the activity in vm.kmem_size="512M" vm.kmem_size_max="512M" I can confirm that while it delays the panics, it doesn't eliminate them (this also seems to be the conclusion of several users that have tested it shortly after it's been posted). The fact that it's not committed is good enough indication that it's not The Answer. (And besides, asking users to apply non-committed patches just to run their systems normally is bad practice :) I can just imagine the Release Notes: "if you're using ZFS, you'll have to manually patch the kernel with this patch:..." :) This close to the -RELEASE, I judge the chances of it being committed are low). _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
No, clearly it is not enough (and you claimed previously to have done more tuning than this). I have it set to 600MB on the i386 system with ZFS already tells you up front that it's experimental code and likely to have problems. Users of 7.0-RELEASE should not have unrealistic Perhaps, but that only applies to 7.0-RELEASE. Kris _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
This looks like we're constantly chasing the "right amount". Does it=20 depend so much on CPU and IO speed that there's never a generally=20 sufficient "right amount"? So when CPU and drive speed increase, the new = Where? What else is there except kmem tuning (including KVA_PAGES)? IIRC = My point is that the fact that such things are necessary (1.5 GB KVA os=20 a lot on i386) mean that there are serious problems which aren't getting = fixed since ZFS was imported (that's over 6 months ago). I see you've added to http://wiki.freebsd.org/ZFSTuningGuide; can you=20 please add the values that work for you to it (especially for KVA_PAGES=20 since the exact kernel configuration line is never spelled out in the=20 I know it's experimental, but requiring users to perform so much tuning=20 just to get it work without crashing will mean it will get a bad=20 reputation early on. Do you (or anyone) know what are the reasons for=20 not having vm.kmem_size to 512 MB by default? Better yet, why not=20 increase both vm.kmem_size and KVA_PAGES to (the equivalent of) 640 MB=20 or 768 MB by default for 7.0? >Users of 7.0-RELEASE should not have unrealistic > expectations. As I've said at the first post of this thread: I'm interested in if it's = ever going to be stable for 7.x.
It depends on your workload, which in turn depends on your hardware. Tuning is an interactive process. If 512MB is not enough kmem_map, then ZFS is a memory hog. There is nothing that can really be done about this, and it is just not a good fit on i386 because of limitations of the hardware architecture. Note that Sun does not recommend using ZFS on a 32-bit system either, for the same reasons. It is unlikely this can really be fixed, although mitigation strategies like the vm_kern.c Increasing vm.kmem_size.max to 512MB by default has other implications, but it is something that should be considered. That is answered in the tuning guide. Tuning KVA_PAGES by default is This was in reply to a comment you made about the vm_kern.c patch affecting users of 7.0-RELEASE. Anyway, to sum up, ZFS has known bugs, some of which are unresolved by the authors, and it is difficult to make it work on i386. It is likely that the bugs will be fixed over time (obviously), but amd64 will always be a better choice than i386 for using ZFS because you will not be continually bumping up against the hardware limitations. Kris _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
