It is hard for me to believe that this is FreeBSD-specific bug, because
checksumming is below FreeBSD-specific code. Of course everything is
possible, but I just think it's just unlikely.I'd start from configuring UFS on top of GELI with authentication. GELI
will also detect silent data corruptions:# geli init -a hmac/md5 -e null -s 4096 -P -K /dev/null /dev/ad4
# geli attach -p -k /dev/null /dev/ad4
# dd if=3D/dev/zero of=3D/dev/ad4.eli bs=3D1m (this will take a while)
# newfs -U /dev/ad4.eli
# mount -o noatime /dev/ad4.eli /mnt/tmp
Try your DB test on this file system.--=20
Pawel Jakub Dawidek http://www.wheel.pl
pjd@FreeBSD.org http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!
Try turning of zil, whilst I don't use a db, I have zfs under high load.
I've found without zil turned off I see checksum corruption as well:/boot/loader.conf
vfs.zfs.zil_disable=1
Cheers,
Benjamin
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Wouldn't it be a bad idea to disable ZIL ?
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Dis...
Regards,
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
A good read is:
http://blogs.sun.com/perrin/entry/the_lumberjack
Which shows why zil exists.
Cheers,
Benjamin
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
So does anybody know of a battery backed NVRAM card that can be used
with FreeBSD that the ZIL could be offloaded to?--
DaveD
Any CF card or similar will do. You don't need battery backup for
flash memory.DES
--
Dag-Erling Smørgrav - des@des.no
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
I did think of that but is a CF card faster than a good SAS or SATA =20
drive? Fastest ones I found have a top rating of 45MB/s. The one =20
battery backed NVARM card that showed up in a google search had a =20
peak rate of 533MB/s. The question seems moot though since FreeBSD =20
doesn't currently support them.Thanks for your time,
--
DaveD
Not in transfer rate, but it could help hugely with seek-intensive IO
loads (since seeks are instantaneous on flash or other solid-state
drives). In theory, they could be of immense benefit for databases and
seek-intensive operations on file systems, but the limited bulk transfer
rates and relatively small sizes (for decent money) currently prevent
their wide-spread use.It would be logical to use a limited size SSD for something like a file
system journal for a large file system, except that these kind ofIf a NVRAM or SSD, or other technology presents the drive as a (S)ATA
drive, there's no reason it shouldn't.
That's no longer true. You can't get more than 5-10MB/s from
seek-intensive RAID0 with two 15K drives, while 20-30MB/s is not a
problem for the comparable priced/sized SSD drive.-Maxim
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
You can get 64GB SSD now below US$1K, which is comparable price-wise to
RAID0 with two 70GB 15K SAS drives. For example:http://accessories.us.dell.com/sna/productdetail.aspx?sku=341-5582&cs=04...
-Maxim
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Kingston CF Elite, 20 / 25 MBps write / read
Kingston CF Ultimate, 40 / 45 MBps write / readSanDisk Extreme III CF, 20 MBps
SanDisk Extreme IV CF, 45 MBpsSony CF 300X, 45 MBps
These are just a few of those available from my regular supplier.
DES
--
Dag-Erling Smørgrav - des@des.no
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
These are all "normal" CompactFlash cards, for which the widely
available size seems to be 16 GB max, right? I was thinking about
something more like this:
http://gizmodo.com/gadgets/peripherals/adatas-128gb-solid-state-drive-see=
s-the-light-of-day-231693.php
or this: http://www.mtron.net/English/Product/pc_msd1000.aspDid you (or anyone) deploy CF drives for production servers?
So? That's more than enough for a ZFS intent log (as a rule of thumb,
My router (and DNS, NTP and DHCP server) is a net4801 with a 1 GB CF
chip.DES
--
Dag-Erling Smørgrav - des@des.no
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
If you're using compact flash for something that's constantly updated
like a ZIL, wouldn't your CF card die real quick?I've deployed CF in production, but as a read-only medium with
occasional writes only for configuration updates.From what I understand the specialized expensive solid-state drives
that you guys are discussing are better designed for this type of write
duty whereas CF would probably not last very long.Since a ZIL is not really seek-intensive, why not just offload it to its
own standard hard disk that has its write caching and all other similar
data-corrupting technologies disabled?
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Yes. I don't see a point writing a log that's mostly sequantially
accessed on a SSD, and which probably wears the same areas on the drive.
I'm more interested in loads like databases.
CF and the flash based SSD drives rotate the flash cells anyway, so it
doesn't matter that much if you write the same block or not.
I wouldn't worry about wearing out those devices, since todays mediaI wouldn't do both with them unless required for a specific reason.
The problem is how they work.
They contain NAND flash chips which have two data areas containing
data blocks of typically slightly more than 4 or 8kB these days.
One area is 100% error free with high write rate, but small and the
other is of much less quality, but large.
Devices use the later for the offered data blocks and the good cells
for maintening allocation of them.
One problem is with the data blocks beeing that big, when writing
512 Byte you effectifly do a read-modify-write of a larger physical
block.
This can be handled quite well with larger FS block.
The much bigger problem is with power loss when writing such a
maintenence block.
You loose a very large area of logical blocks when this fails,
since a 4k maintenence block contains the allocation for several hundert
kB of logical data blocks.
In other words - you possibly loose data blocks that were not written
a long time and the database wouldn't expect a problem with that data.
Even for ZIL it is very questionable if you loose a large data area,
since the purpose is to have the data that was already sinced readable
after a power loss.
I'm not sure what happens in case of a device reset in the wrong moment,
possibly this depends on the specific media, but I wouldn't be surprised
to see read errors after a reset without power loss as well.
This is true with all NAND based flash media, SD, MMC, SM, CF, ...
There are medias which are less critical because of the way they utulize
the maintenance blocks, but those things are usually a secret to the
vendor.
I do run PostgreSQL on SD media with ARM based FreeBSD systems, but
I'm prepared to loose the whole database and to recover it from backup
if things go wrong.--
B.Walter [ message continues ]
Bernd Walter wrote:
...ZFS doesn't suffer from this problem because the design
is to always write a new section of data rather than
over write "current" data.So if you lose power in the middle of a write to a data
block, there is no damage to the old data.Darren
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
THis may also have to wait for a future version of ZFS. I remember
reading about this kind of thing as an upcoming feature in Solaris. I
believe the way this feature would work is that ZFS would allow creating
the ZIL on a different pool to the filesystem - i.e. create a zpool on
the CF card and get the ZIL to live there._______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
AFAIK this is already implemented in ZFS, though I'm not sure Pawel has
merged it into FreeBSD yet.Note that you can also get disk drives with a certain amount of NAND
flash built-in, but FreeBSD doesn't support that yet.DES
--
Dag-Erling Smørgrav - des@des.no
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
http://www.opensolaris.org/os/community/zfs/version/7/
You simply add a 'log' vdev to a pool. It's included in snv_75, maybe
earlier./Kenneth
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
They are intended for use with Vista - and have recently been found to
be marginally more effective than a placebo (for Vista-performance).
The speed-gains are barely distinguishable from measurement-errors...
So, if the drives would help ZFS, it would be a big irony.There are companies that manufacture "pure" SSDs (www.superssd.com,
www.soliddata.com) with battery-backup.
Unfortunately, the price-tag of these systems is still beyond reach for
normal customers.cheers,
Rainer
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Yes. However, FreeBSD suffers from deadlocks under load if ZIL is enabled.
-Kip
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
> Yes. However, FreeBSD suffers from deadlocks under load if ZIL is enabled.
Is there some ML post / documentation on this? I am trying to keep up-to-da=
te=20
on ZFS status (on FreeBSD and otherwise), but I don't think this has been=20
discussed on any of the usual mailing lists.=2D-=20
/ Peter SchullerPGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org
Should I open a page on wiki.freebsd.org to account the ZFS-related bugs?=
:)
ed.
Do you know how such deadlocks manifest? Do they perhaps result in a
process locked in "zfs" wchan (not "zfs:&...")?
It also comes down to what your doing. ZFS is always consistent on disk.
ZIL provides the journal between the last pool transaction write and
what has changed since that write. Either way zfs will come up cleanly
after a power failure, it's just whether you have those last few sync's
or not.
For the application I'm using zfs for (rsynced backups, snapshoted
daily) that'll be corrected the next day anyway. For a DB, this could be
a show stopper.Cheers,
Benjamin
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
| Amit K. Arora | [RFC] Heads up on sys_fallocate() |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| Linus Torvalds | Linux 2.6.25-rc4 |
| Greg KH | Linux 2.6.25.10 |
git: | |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | [GIT]: Networking |
| Ilpo Järvinen | Re: Strange Application bug, race in MSG_PEEK complaints (was: Bug#513695: fetchma... |
