login
Header Space

 
 

Re: [ANNOUNCE] Ramback: faster than a speeding bullet

Previous thread: Re: [ANNOUNCE] GIT 1.5.4.4 by Junio C Hamano on Monday, March 10, 2008 - 2:34 am. (2 messages)

Next thread: [PATCH] at73c213: Remove redundant private_free routine by Atsushi Nemoto on Monday, March 10, 2008 - 3:13 am. (3 messages)
To: <linux-kernel@...>
Date: Monday, March 10, 2008 - 2:46 am

Every little factor of 25 performance increase really helps.

Ramback is a new virtual device with the ability to back a ramdisk
by a real disk, obtaining the performance level of a ramdisk but with
the data durability of a hard disk.  To work this magic, ramback needs
a little help from a UPS.  In a typical test, ramback reduced a 25
second file operation[1] to under one second including sync.  Even
greater gains are possible for seek-intensive applications.

The difference between ramback and an ordinary ramdisk is: when the 
machine powers down the data does not vanish because it is continuously 
saved to backing store.  When line power returns, the backing store
repopulates the ramdisk while allowing application io to proceed 
concurrently.  Once fully populated, a little green light winks on and 
file operations once again run at ramdisk speed.

So now you can ask some hard questions: what if the power goes out 
completely or the host crashes or something else goes wrong while 
critical data is still in the ramdisk?  Easy: use reliable components.  
Don't crash.  Measure your UPS window.  This is not much to ask in 
order to transform your mild mannered hard disk into a raging superdisk 
able to leap tall benchmarks at a single bound.

If line power goes out while ramback is running, the UPS kicks in and a 
power management script switches the driver from writeback to 
writethrough mode.  Ramback proceeds to save all remaining dirty data 
while forcing each new application write through to backing store 
immediately.

If UPS power runs out while ramback still holds unflushed dirty data 
then things get ugly.  Hopefully a fsck -f will be able to pull 
something useful out of the mess.  (This is where you might want to be 
running Ext3.)  The name of the game is to install sufficient UPS power 
to get your dirty ramdisk data onto stable storage this time, every 
time.

The basic design premise of ramback is alluringly simple: each write to 
a ramdisk sets a per-chunk dirty ...
To: Daniel Phillips <phillips@...>
Cc: <linux-kernel@...>
Date: Wednesday, March 12, 2008 - 8:01 am

What about doing a similar thing as a device mapper target? Have a look a 
dm-cache, I know that development of that has stopped but it doesn't mean 
it couldn't be ressurected. It has an advantage that it is generic (any 
two block devices will do) and you don't need to populate the "cache" on 
start-up - it happens automatically through cache misses.

Another use could be a flash based disk accelerator which may be pretty 
popular nowadays.

Tvrtko


Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon,
OX14 3YP, United Kingdom.

Company Reg No 2096520. VAT Reg No GB 348 3873 20.

--
To: <tvrtko.ursulin@...>
Cc: <linux-kernel@...>
Date: Wednesday, March 12, 2008 - 1:27 pm

It is a device mapper target (though there is no real advantage in that
other than having a handy plug-in api).  It does handle any two block
devices, and it does populate on cache miss.  But also has daemon-driven
population, since it never makes sense to leave the backing disk idle
then have to incur read latency because of that later.

Regards,

Daniel
--
To: Daniel Phillips <phillips@...>
Cc: <linux-kernel@...>
Date: Tuesday, March 11, 2008 - 1:06 am

/proc is so 1990's.  As your code has nothing to do with processes,
please don't add new files in /proc/.  sysfs is there for you to do

Use debugfs for stuff like debug info like this.

thanks,

greg k-h
--
To: Greg KH <greg@...>
Cc: <linux-kernel@...>
Date: Tuesday, March 11, 2008 - 1:22 am

Demonstrate some advantage and I will think about it.

Daniel
--
To: Daniel Phillips <phillips@...>
Cc: <linux-kernel@...>
Date: Tuesday, March 11, 2008 - 2:27 am

Again, as your code has nothing to do with "processes", please do not
add new files to /proc.

As you are a filesystem, why not /sys/fs/ ?

It ends up with smaller code than procfs stuff as well, a good and nice
advantage.

thanks,

greg k-h
--
To: Daniel Phillips <phillips@...>
Cc: Greg KH <greg@...>, <linux-kernel@...>
Date: Tuesday, March 11, 2008 - 1:48 am

use of /proc is discouraged, if you insist on sticking with it in the face 
of opposition you will seriously hurt the chance of your patches being 
accepted.

David Lang
--
To: Daniel Phillips <phillips@...>
Cc: <linux-kernel@...>
Date: Monday, March 10, 2008 - 2:49 pm

So that's what I've been doing wrong for all these years...

-- Chris
--
To: Daniel Phillips <phillips@...>
Cc: <linux-kernel@...>
Date: Monday, March 10, 2008 - 10:51 am

So you apparently want three things:

a) ignoring fsync() and co on this device
b) disabling all write throttling on this device
c) never discarding cached data from this device

anything else i'm missing?

Alan already suggested the ramfs+writeback thread approach (possibly
with a little bit of help from the fs which could report just the dirty
regions), but i'm not sure even that is necessary.

(a) can be easily done (fixing the app, LD_PRELOAD or fs extension etc)
(b) couldn't the per-device write throttling be used to achieve this?
(c) shouldn't be impossible either, eg sticking PG_writeback comes to mind,
    just the mm accounting needs to remain sane.

IOW can't this be done in a more generic way (and w/o a ramdisk in the

apples to oranges. what are the numbers for a nonjournalled disk-backed
fs and _without_ the sync? (You're not committing to stable storage anyway
so the sync is useless and if you don't respect the ordering so is the
journal)

artur 
--
To: Daniel Phillips <phillips@...>
Cc: <linux-kernel@...>
Date: Monday, March 10, 2008 - 5:22 am

Nice fiction - stuff crashes eventually - not that this isn't useful. For
a long time simply loading a 2-3GB Ramdisk off hard disk has been a good

Ext3 is only going to help you if the ramdisk writeback respects barriers


Why not - providing you clear the dirty bit before the write and you
check it again after ? And on the disk size as you are going to have to
suck all the content back in presumably a log structure is not a big


If you are prepared to go bigger than the fs chunk size so lose the
ordering guarantees your chunk size really ought to be *big* IMHO

Alan
--
To: Alan Cox <alan@...>
Cc: <linux-kernel@...>
Date: Monday, March 10, 2008 - 11:50 pm

Hi Alan,

Nice to see so many redhatters taking an avid interest in storage :-)


Right, and now with ramback you will be able to preserve that state and


But that does not satisfy the requirement you snipped:

 * Applications need to be able to read and write ramback data during

More accurately: in general, cannot transfer directly.  The ramdisk may
be external and not present a memory interface.  Even an external
ramdisk with a memory interface (the Violin box has this) would require
extra programming to maintain cache consistency.  Then there is the
issue of ramdisks on the way that exceed the 40 bit physical addressing
of current generation processors.

Even for the simple case where the ramdisk is just part of the kernel
unified cache, I would rather not go delving into that code when these
transfers are on the slow path anyway.  Application IO does its normal
single copy_to/from_user thing.  If somebody wants to fiddle with vm,
the place to attack is right there.  The copy_to/from_user can be
eliminated (provided alignment requirements are met) using stupid page
table tricks.  In spite of Linus claiming there is no performance win 


"640K should be enough for anyone"


The finer the granularity the faster the ramdisk syncs to backing
store.  The only attraction of coarse granularity I know of is
shrinking the bitmap, which is currently not so big that it presents
a problem.

Your comment re fs chunk size reveals that I have failed to
communicate the most basic principle of the ramback design: the
backing store is not expected to represent a consistent filesystem
state during normal operation.  Only the ramdisk needs to maintain a
consistent state, which I have taken care to ensure.  You just need
to believe in your battery, Linux and the hardware it runs on.  Which
of these do you mistrust?

Regards,

Daniel
--
To: Daniel Phillips <phillips@...>
Cc: <linux-kernel@...>
Date: Wednesday, March 12, 2008 - 9:11 am

Actually no - ramback would be useless to this. You might crash and end

Oh you mean "pray hard". e2fsck works well with typical disk style
failures, it is not robust against random chunks vanishing. I know this
as I've worked on and debugged a case where a raid card rebooted silently



I was suggesting that you want log structure for the writeback disk so
that you keep coherency and can recover it, an issue you seem intent on



No I get that. You've ignored the fact I'm suggesting that design choice

In a big critical environment - all three.

Alan
--
To: Alan Cox <alan@...>
Cc: <linux-kernel@...>
Date: Wednesday, March 12, 2008 - 1:29 pm

So then you know that people already rely on batteries in critical storage
applications.  So I do not understand why all the FUD from you.

Particularly about Ext2/Ext3, which does recover well from random damage.

You seem to be calling Linux unreliable.

Daniel
--
To: Daniel Phillips <phillips@...>
Cc: Alan Cox <alan@...>, <linux-kernel@...>
Date: Thursday, March 13, 2008 - 1:39 am

By "recover well", you must mean "loses massive swabs of data, leaving
the system unbootable and with enormous numbers of user files missing." 
My experience.

Expecting fsck to cover for missed writes is stupid.
--
To: David Newall <davidn@...>
Cc: Alan Cox <alan@...>, <linux-kernel@...>
Date: Thursday, March 13, 2008 - 2:14 am

Whatever it can get off the disk it gets.  It does a good job.  If you
don't think so, then don't tell me, tell Ted.

Daniel
--
To: Daniel Phillips <phillips@...>
Cc: David Newall <davidn@...>, <linux-kernel@...>
Date: Thursday, March 13, 2008 - 9:22 am

On Wed, 12 Mar 2008 22:14:16 -0800

He knows. Ext3 cannot recover well from massive loss of intermediate
writes. It isn't a normal failure mode and there isn't sufficient fs
metadata robustness for this. A log structured backing store would deal
with that but all you apparently want to do is scream FUD at anyone who
doesn't agree with you.

Alan
--
To: Alan Cox <alan@...>
Cc: David Newall <davidn@...>, <linux-kernel@...>
Date: Thursday, March 13, 2008 - 3:14 pm

Scream is an exaggeration, and FUD only applies to somebody who
consistently overlooks the primary proposition in this design: that the
battery backed power supply, computer hardware and Linux are reliable
enough to entrust your data to them.  I say this is practical, you say
it is impossible, I say FUD.

All you are proposing is that nobody can entrust their data to any
hardware.  Good point.  There is no absolute reliability, only degrees
of it.

Many raid controllers now have battery backed writeback cache, which
is exactly the same reliability proposition as ramback, on a smaller
scale.  Do you refuse to entrust your corporate data to such
controllers?

Daniel
--
To: Daniel Phillips <phillips@...>
Cc: Alan Cox <alan@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Saturday, March 15, 2008 - 4:59 pm

RAID controllers do not have half a terabyte of RAM. Also, you are always
invited to choose between speed (write back) and reliability (write through).

Also, please note that the problem here is not related to the number of
nines of availability. This number only counts the ratio between uptime
and downtime. We're more facing a problem of MTBF, where the consequences
of a failure are hard to predict.

What I'm thinking about is that considering the fact that storage
technologies are moving towards SSD (and I think 2008 will be the
year of SSD), you should implement ordered writes (I've not said
write through) since there's no seek time on those devices. Thus
you will have the speed of RAM with the reliability of a properly
synced FS. If your system crashes once a week, it will not be a
problem anymore.

Willy

--
To: Willy Tarreau <w@...>
Cc: Daniel Phillips <phillips@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Saturday, March 15, 2008 - 4:56 pm

The write back ones are also battery backed properly, and will switched
to write through (flushing out the cache) on the first sniff of a low
battery signal.

The decent ones (the kind used in serious business) also let you swap the
battery backed RAM module to another card in the event of a failure of a
card so you can complete recovery.
--
To: Alan Cox <alan@...>
Cc: Willy Tarreau <w@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Saturday, March 15, 2008 - 5:25 pm

Right, just like the Violin 1010, whose PCI-e cable can be hotplugged
into a different server.  Or plugged into two servers at the same time,
because each 1010 has two PCI-e interfaces, so this can be done without
manual intervention.

See, we really are talking about the same thing.  Except that ramback
does it bigger and faster.

Daniel
--
To: Daniel Phillips <phillips@...>
Cc: Willy Tarreau <w@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Saturday, March 15, 2008 - 5:08 pm

On Sat, 15 Mar 2008 13:25:48 -0800

No because you don't honour the ordering and tag boundaries as they do.

Alan
--
To: Alan Cox <alan@...>
Cc: Willy Tarreau <w@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Saturday, March 15, 2008 - 5:51 pm

Sophism.  The statement was "battery backed properly" and "switch on
first sniff", which is example how ramback works.

Daniel
--
To: Willy Tarreau <w@...>
Cc: Alan Cox <alan@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Saturday, March 15, 2008 - 5:17 pm

And?  Either you have battery backed ram with critical data in it or


That is why I keep recommending that a ramback setup be replicated or
mirrored, which people in this thread keep glossing over.  When
replicated or mirrored, you still get the microsecond-level transaction
times, and you get the safety too.

Then there is a big class of applications where the data on the ramdisk
can be reconstructed, it is just a pain and reduces uptime.  These are
potential ramback users, and in fact I will be one of those, using it

There will be a whole bunch of patches from me that are SSD oriented,
over time.  The fact is, enterprise scale ramdisks are here now, while
enterprise scale flash is not.  Getting close, but not here.  And flash
does not approach the write performance of RAM, not now and probably
not ever.

Daniel
--
To: Daniel Phillips <phillips@...>
Cc: Willy Tarreau <w@...>, Alan Cox <alan@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 1:42 am

Do you mean it should be replicated with a second ramback?  That would
be pretty pointless, since all failure modes would affect both.  It's
not like one ramback will survive a crash when the other doesn't.
--
To: David Newall <davidn@...>
Cc: Daniel Phillips <phillips@...>, Willy Tarreau <w@...>, Alan Cox <alan@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 6:15 pm

It could, in a bit different location maybe, but it isn't a substitute
for ordered writes.
-- 
Krzysztof Halasa
--
To: Krzysztof Halasa <khc@...>
Cc: David Newall <davidn@...>, Willy Tarreau <w@...>, Alan Cox <alan@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 6:38 pm

How so?

Daniel
--
To: Daniel Phillips <phillips@...>
Cc: David Newall <davidn@...>, Willy Tarreau <w@...>, Alan Cox <alan@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 7:08 pm

Not sure if I understand the question correctly but obviously a pair
(mirror) of servers running "dangerous" ramback would survive a crash
of one machine and we could practically eliminate the probability of
both (all) machines crashing simultaneously. However, there are
cheaper ways to achieve similar performance and even better
reliability - including those battery-backed (RAI)Disk controllers.
-- 
Krzysztof Halasa
--
To: Krzysztof Halasa <khc@...>
Cc: David Newall <davidn@...>, Willy Tarreau <w@...>, Alan Cox <alan@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 7:43 pm

OK, so we are only searching for the cheapest way to achieve these
kinds of speeds, for some given uptime and risk level requirements.
That is a really interesting subject, but can we please leave it for a
while so I can get some work done on the code itself?

Thanks,

Daniel
--
To: David Newall <davidn@...>
Cc: Willy Tarreau <w@...>, Alan Cox <alan@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 4:48 pm

A second machine running a second ramback, on a second UPS pair.
I thought that was obvious.

Daniel
--
To: <linux-kernel@...>
Date: Saturday, March 15, 2008 - 7:18 pm

Besides, some SAN Storage Devices do have that amount of Ram. However it is
better protected as in your typical PC. With Mirroring, it can be removed
(including the battery packs) - and there is a procedure to actually replay
the buffers once the new devices are in place.

But thats not an argument against or in favor of Ramback, its just two
different things. You would be suprised how many databases run on write back
mode disks without fdsync() any nobody cares :)

Greetings
Bernd
--
To: Daniel Phillips <phillips@...>
Cc: Alan Cox <alan@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Saturday, March 15, 2008 - 5:54 pm

It completely changes the method to power it and the time the data may
remain in RAM. The Smart 3200 I have right here simply has lithium
batteries directly connected to the static RAM chips. Very low risk of
power failure. The way your presented your work shows it rely on a UPS
to sustain the PC's power supply, which it turn maintains the PC alive,
which in turn tries not to reboot to keep its RAM consistent. There are
a lot of reasons here to get a failure.

Don't get me wrong, I still think your project has a lot of usages. But
you have to admit that there are huge differences between using it in
an appliance with battery-backed RAM which is able to recover data after
a system crash, power outage or anything, and the average Joe's PC setup
as an NFS server for the company with a cheap UPS to try not to lose the
data should a power outage occur.


I agree, but in this case, you should present it this way. You have been
insisting too much on the average PC's reliability, the fact that no kernel
ever crashed for you, etc... So you are demonstrating that your product is
good provided that everything goes perfectly. All people who have experienced
software or hardware problems in the past (ie mostly everyone here) will not
trust your code because it relies on pre-requisites they know they do not

My goal is not to replace RAM with flash, but disk with flash. You are
against ordered writes for a performance reason. Use SSD instead of
hard drives and it will be as fast as sequential writes. Also, when
you say that enterprise scale flash is not there, I don't agree. You
can already afford hundreds of gigs of flash in 3,5" form factor. An
1.6 TB SSD has even been presented at CES2008, with sales announced
for Q3. So clearly this will replace your hard drives soon, very soon.
Even if it costs $5k, that's a very acceptable solution to replace a
disk in a RAM-speed appliance.

Willy

--
To: Willy Tarreau <w@...>
Cc: Alan Cox <alan@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Saturday, March 15, 2008 - 6:33 pm

It already has ordered write when it is in flush mode.

OK, I hear you. There will be an ordered write mode that uses barriers
to decide the ordering.  It will greatly reduce the speed at which
ramback can flush dirty data because of the need to wait synchronously
on every barrier, of which there are many.  And thus will widen out the
window during which UPS power must remain available if power goes out,
in order to get all acknowledged transactions on to stable media.  The
advantage is, the stable media always has a point-in-time version of
the filesystem.

Don't expect this mode in the immediate future though, there are bugs
to fix in the current driver, which already implements the required

That would have been a miscommunication then.  I see arguments coming
in that suggest embedded solutions, EMC for example, are inherently more
reliable than a Linux based solution.  Well guess what?  Some of those
embedded solutions already use Linux.

Also, peecees are much more reliable than people give them credit for,
especially if you harden up the obvious points of failure such as fans
and spinning disks.  Once you have your system all hardened up, then
you _still_ better replicate your important data.  Perhaps I should not
admit this, but I simply fail to do that on the machine from which I am
posting right now, which also runs my web server and mail system.  That
is because I would have to reboot it to install ddsnap so I can replicate
properly, and because the thing is so darn reliable that I just have
not gotten around to it.  I do copy off the important files from time
to time though, and do various other things to ameliorate the risk.  If


Exactly what I mean: close but not there.  Those gigantic RAM boxes are
shipping now, and the same company has got a 5 TB flash box coming down
the pipe, and sooner than Q3.  But the RAM box will always outperform
the flash box.  You just keep throwing writes at it until all available
flash is in erase mode, and the thing slows down.  If ...
To: Daniel Phillips <phillips@...>
Cc: Alan Cox <alan@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Saturday, March 15, 2008 - 7:22 pm

But their RAM does not depend on a lot of factors to remain valid and


Securing every component simply reduces the risk of a loss of service.
What is important with data is to know the consequences of loss of service.
If that only means that no one can work and that the last second of work is
lost, it's generally acceptable. If it means everything is lost to a corrupted

No, you're replacing disk activity with RAM activity. But you keep disk as

Sorry if I was not clear. I was not speaking about replacing the RAM with
flash, but only the disks. You keep the RAM for the speed, and use flash
for permanent storage instead of disks. No seek time, average RW speed now
slightly better than disks, that combined with your ramdisk and ordered
write-backs writes will have the best of both worlds : RAM speed and flash
reliability.

Willy

--
To: Willy Tarreau <w@...>
Cc: Alan Cox <alan@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Saturday, March 15, 2008 - 11:33 pm

For example?

Anecdote time.  Remember there used to be "brand name" floppy disks and
generic floppy disks, and the brand name ones cost a lot more because
they were supposedly safer?  Well, big secret, studies were done and
the no-name disks came out better.  Why?  Because selling at commodity
prices the generic makers could not afford returns.  So they made them
well.

It is like that with PCs.  Supposedly you get a lot more reliability
when you spend more money and buy all high end near-custom gear.  In
fact, the cheap stuff just keeps on chugging, because those guys can't
afford to have it break.

So please don't underestimate the reliability of a PC.

There are bits of Linux that are undeniably dodgy.  We get a lot of bug
reports about usb for example, keyboards just quitting and it's not the
keyboard's fault.  Just say no to usb in a server, at least until some
fundamental cleanup happens there.

The worst bug I've seen in a server this year?  A buggy bios in a Dell
server that would issue a keyboard error and sit and wait for somebody
to press F1 when there was no keyboard attached.  That is embedded
software for you.  Personally, I think we do way better than that in

Yes.  Dual power supplies are highly recommended for this application.
With dual power supplies you can carry out preemptive maintenance on

So mirror two of them, I keep saying.  If that is not good enough for
you, then make it three way, and replicate for good measure.  The thing
is, none of that hurts the microsecond level performance, and it gets
you whatever data security you desire.  Whereas anything that requires
waiting on disk transactions does hurt performance.  Since my interest
currently lies in high performance, that is where my effort goes.  And
do I need to say it: patches gratefully accepted.

For my immediate application... hacking the kernel in comfort... just

Right.  What we are talking about is filling in a missing level in the
cache hierarchy, something like:

   L1 .3 ns
   ...
To: Daniel Phillips <phillips@...>
Cc: Willy Tarreau <w@...>, Alan Cox <alan@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 6:02 pm

I don't think so. I remember we had much more problems with noname
disks. And yes, certain brands had been problematic too, but most

The real life can't agree with this at all. The servers keep working
for years and the cheap stuff quit fast (if initially working, which

Most BIOS (all I've seen in this Millennium) have an option to disable
that.

On a server board you can usually have a remote console, how could


We already have RAM between L3 and Flash.
The problem is flushing L1 to disk/flash takes time.
-- 
Krzysztof Halasa
--
To: Daniel Phillips <phillips@...>
Cc: Willy Tarreau <w@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 9:14 am

They don't care if it breaks after 12 months, and for components and
addons they don't care if it breaks, they just blame the end user for
mis-installation or 'incompatibility'. There is a huge difference in

Perhaps. But if your cache can destroy the contents of the layer below in
situations that do occur it isn't useful. If you can fix that then it
obviously has a lot of potential.

Alan
--
To: Alan Cox <alan@...>
Cc: Daniel Phillips <phillips@...>, Willy Tarreau <w@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 3:04 pm

Actually, it's worse than that.  Users have been trained that when a
computer bluescreens and losing all of their data, it's either (a)
just the way things are, or (b) it's microsoft's fault.  Worse yet,
thanks to things like PC benchmarks, hard drive manutacturers have in
the past been encouraged to do things like lie to the OS about when
things had hit the hard drive platter just to score higher numbers on
winbench.

All of this is why I've in the past summed all of this up as Ted's law
of PC class hardware, which is that PC class hardware is cr*p.  :-)

            	    	      	       	       - Ted

--
To: Daniel Phillips <phillips@...>
Cc: Alan Cox <alan@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 2:56 am

What I mean is that in a PC, RAM contents are very fragile :
 - weak batteries in your UPS =&gt; end of game
 - loosy power cable between UPS and PC =&gt; end of game (BTW I have a customer
   who had such a problem, cables had both disconnected because of their own
   weight).
 - kernel panic =&gt; end of game
 - user error during planned maintenance =&gt; end of game
 - flaky driver writing to wrong memory location =&gt; can't trust your data

In a normal PC, even if the RAM itself is a reliable component (ECC, ...)
a lot of such problems which may happen will render it unusable. If you
have to reboot, your BIOS will clean it up for you. That's why people are
trying to explain to you that linux is not reliable enough to work like
this.

Now if you have all your RAM on a PCI-E board with a battery and which is
not initialized by the BIOS so that it survives reboots, it changes a LOT
of things, because all the problems mentionned above go away. Let me
repeat it, the problem is not that those components are too unreliable
to build a transactional system, it is that used in this manner, a very
simple failure of any of them is enough to lose/corrupt all of your data.

That was not my experience when I was a student. We would buy very cheap
diskettes which were only sold by 100. 20% of them were already defective,
and 20% of the remaining ones could not keep our data till the next morning!
I knew guys who finally stopped copying games due to those diskettes, so

If you have understood what I explained above, now you'll understand that
I'm not underestimating the reliability of my PC, just the fact that keeping
access to my RAM contents involves a lot of components, any of which will


I thought this stupidity disappeared about 5 years ago ? I was about to
build PIC-based PS/2 "terminators" to plug into machines to avoid this

I never spoke about waiting for disk transactions. The RAM must be the
only source and target of user data. Disk is there for permanent storage
and should ...
To: Willy Tarreau <w@...>
Cc: Daniel Phillips <phillips@...>, Alan Cox <alan@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 6:12 pm

Not sure if things like SLR-2 or so are still available, except second
hand. But they at least provide compatibility for some time.
-- 
Krzysztof Halasa
--
To: Daniel Phillips <phillips@...>
Cc: Willy Tarreau <w@...>, Alan Cox <alan@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 1:24 am

I strongly disagree.  Cheap PC hardware is not even close to the quality
of a serious, branded machine.  Often capacitors are missing from power
lines, and the ones that are installed fail sooner.  Cooling fans are
lower quality and fail much sooner.  Timing issues abound.

There's a reason why an IBM is a better machine than a "Black-n-Gold":
IBM value their name so when you have a problem, they have a problem. 
Buy generic and when you get a problem they already have your money and
since they have no investment in their name, they have nothing more to
care about.
--
To: David Newall <davidn@...>
Cc: Daniel Phillips <phillips@...>, Willy Tarreau <w@...>, Alan Cox <alan@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 8:49 am

That's just nonsense in a consolidated market.

You change to IBM, then to Dell, then to HP
then again to IBM. Maybe you even try Sun.

That causes you more grief than any one of them.

I have seen people doing that in all industry branches
and even privately.

If you love brands, then your choice becomes very limited.
That's the real reason for them being much more expensive.

If you think machines and specs, then you have a much more clear
picture. After a while you even have your own measures for failure 
rates of those components and can handle it. No matter which brand :-)


Best Regards

Ingo Oeser
--
To: Daniel Phillips <phillips@...>
Cc: Willy Tarreau <w@...>, Alan Cox <alan@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Saturday, March 15, 2008 - 7:22 pm

it will mean that the window is larger, but it will also mean that if 
something else goes wrong and that window is not available the data that 
was written out will be useable (recent data will be lost, but older data 
will still be available)

as for things that can go wrong

the UPS battery can go bad
you can have multiple power failures in a short time so your battery is not fully charged
capacitors in the UPS can go bad
capacitors in the power supply can go bad
capacitors on the motherboard can go bad
a kernel bug can crash the system
a bug in a device driver (say nvidia graphics driver) can crash the system
a card in the system can lock up the system bus
the system power supply can die
the system fans can die and cause the system to overheat
cooling in the room the system is in can fail and cause the system to overheat
airflow to the computer can get blocked and cause the system to overheat
some other component in the computer can short out and cause the system to loose power internally

I have had every single one of these things happen to me over the years. 
Some on personal equipment, some on work equipment. At work I recently had 
a series of disasters where capacitors in a 7 figure UPS blew up, and a 
few days later during a power outage when we were running on generator, a 
fuel company made a mistake while adding fuel to the generator and knocked 
it out.

Even if you spend millions on equipment and professionals to set it up and 
maintain it, you can still go down.

You may not care about it on your system (becouse you copy data elsewhere 
and don't change it rapidly), but most people do. with your current 
approach you are slightly better then a couple shell scripts from an 
availability point of view, you are no better in performance, but your 
failure mode is complete disaster.

comparing you to 'cp drive ramdisk' at startup and 'rsync ramdisk drive' 
periodicly and at shutdown you are faster at startup, close enough at 
shutdown as to be in the noise (eit...
To: <david@...>
Cc: Daniel Phillips <phillips@...>, Willy Tarreau <w@...>, Alan Cox <alan@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Saturday, March 15, 2008 - 7:57 pm

Actually modern DRAM can be put into "self refresh" mode which don't
need (nor allow) any external accesses. Not very practical in typical
PC case, though I think suspend to RAM uses it. Could be used for
battery - backed RAID/disk controller as well.

Obviously it changes nothing WRT ramback.
-- 
Krzysztof Halasa
--
To: Daniel Phillips <phillips@...>
Cc: Willy Tarreau <w@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Saturday, March 15, 2008 - 5:03 pm

It makes a lot of difference, and in addition raid controllers (good
ones) respect barrier ordering in their RAM cache so they'll take tags or

Either you keep a mirror in sync and get normal data rates or you keep
the mirror out of sync and then you need to sort your writeback process
out to preserve ordering.

If you want ramback to be taken seriously then that is the interesting
problem to solve and clearly has multiple solutions if you would start to
take an objective look at your work.

--
To: Alan Cox <alan@...>
Cc: Willy Tarreau <w@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Saturday, March 15, 2008 - 6:00 pm

Ramback should obviously respect barriers, and it does, though at
present only in the crude, default way of letting the block layer
handle it.

But interpreting a barrier to mean flush through to rotating media...
performance will drop to the millisecond per transaction zone, like a
normal disk.  Not what ramback users want in normal operating mode.
Flush mode, yes.

Even raid controllers... so you agree that some of them just don't
respond conservatively to tagged commands, either because the engineers
don't know how to implement that (unlikely) or because they want to win
the performance benchmarks, and they do trust their battery?

"Some raid controllers" is just as good for my argument as "all raid
controllers".  Nobody is telling you which raid controller to use in
your own personal system.  I will pick the fast one and you can pick

Ramback already is taken seriously, just not by you.  That is fine, you
apparently do not need or want the speed.

Anyway, please do not get the impression that I am ignoring your ideas.
There are some nice, intermediate modes that ramback could and in my
opinion, should implement, to give users more options on how to trade
off performance against resilience.  I just need to make it clear that
ramback, as conceived, already gives system builders the capability
they need to achieve microsecond level transaction throughput and data
safety at the same time... given a reliable battery, which is where we
started.

Daniel
--
To: Daniel Phillips <phillips@...>
Cc: Willy Tarreau <w@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Saturday, March 15, 2008 - 7:05 pm

That isn't anything to do with what was being proposed. *ORDERING* not

The ones that don't respect tagged ordering are the ultra cheap nasty
things you buy down the local computer store that come with a 2 page
manual in something vaguely like English. The stuff used for real work is

I want the speed and reliability. Without that ramback is a distraction

You have no guarantee of commit to stable storage so your use of the word
"transaction" is a bit farcical.

There are a whole variety of ways to get far better results than "whoops
bang there goes the file system". Log structured backing media is one,
even snapshots. That way you'd quantify that for the cost of more
rotating storage (which is cheap) you can only lose "x" minutes of data
and will lose everything from a defined consistent point. File based
backing store also has similar properties done right, but needs some
higher level care to track closure and dirty blocks on a per inode basis.

Alan
--
To: Alan Cox <alan@...>
Cc: Willy Tarreau <w@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 5:57 pm

This is where you have made a fundamental mistake in your proposal.
Suppose you have a steady, heavy write load onto ramback.  Eventually,
the entire ramdisk will be dirty and you have to drop back to disk
speed, right?  My design does not suffer from that problem, but your
proposal does.

It gets worse than that.  Suppose somebody writes the same region
twice, how do you order that?  Do you try to store that new data
somewhere, keeping in mind that we are already at terabyte scale?  Is

Somebody has.  But please feel free to solve some other problem.  I

The UPS provides a guarantee of commit to stable storage.  No amount of
FUD will change that.  But please go ahead and calculate the risks
involved.  I am confident you will admit that there are standard]
techniques available to ameliorate risk, which may be applied _on top of_
ramback, thus not destroying its microsecond-level transaction
performance as you propose.

Daniel
--
To: Daniel Phillips <phillips@...>
Cc: Alan Cox <alan@...>, Willy Tarreau <w@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 9:31 pm

What about system crashes?  They guarantee that data will be lost.  I
know opinions are divided on the subject of crashes: You say Linux
doesn't; everybody else says it does.  I side with experience.  (It does.)
--
To: David Newall <davidn@...>
Cc: Alan Cox <alan@...>, Willy Tarreau <w@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 10:42 pm

Not if it is mirrored and replicated.  Also nice if crashes are very

I say it does not crash often, to the point where I have not seen it
crash once for any reason I did not create myself (I tend to wait for
the occasional brown bag release to fade away before shifting development  We do get quite a few
reports of less mature systems like hald and usb causing problems, and
not too long ago NFS client was very crash happy.  I did see some of
those myself two years ago, and fixed them.

On the whole, Linux is very reliable.  Very very reliable.  Now mirror
that, replicate it, add in 2 x 2 redundant power supplies backed by
independent UPS units so you can do regular preemptive maintenance on
the batteries, and you have a sweet enterprise transaction processing
system.  All set for a faster than light moon shot :-)

Daniel
--
To: Daniel Phillips <phillips@...>
Cc: David Newall <davidn@...>, Alan Cox <alan@...>, Willy Tarreau <w@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 11:59 pm

if you are depending on replication over the network you have just limited 
your throughput to your network speed and latency. on an enterprise level 
machine the network can frequently be significantly slower than the disk 
array that you are so frantic to avoid waiting for.

David Lang
--
To: <david@...>
Cc: David Newall <davidn@...>, Alan Cox <alan@...>, Willy Tarreau <w@...>, <linux-kernel@...>
Date: Monday, March 17, 2008 - 1:52 am

Replication does not work that way.  On each replication cycle, the
differences between the most recent two volume snapshots go over the
network.  This strategy has the nice effect of consolidating rewrites.
There are also excellent delta compression opportunities.

In the worst case, with insufficient bandwidth for the churn rate of
the volume, replication rate increases to the time for replicating the
full volume.  Again, at worst, this would require extra storage for the
snapshot to be replicated equivalent to the original volume size, so
that the primary volume is not forced to wait synchronously for a
replication cycle to complete.

Mirroring on the other hand, makes a realtime copy of a volume, that is
never out of date.


Frantic... your word.  Designing for dependably high transaction rates
requires a different mode of thinking that some traditionalists seem to
be having some trouble with.

Daniel
--
To: Daniel Phillips <phillips@...>
Cc: <david@...>, Alan Cox <alan@...>, Willy Tarreau <w@...>, <linux-kernel@...>
Date: Monday, March 17, 2008 - 3:14 am

I think you've just tried to obfuscate the truth.  As you have
described, replication does not provide full protection against data
loss; it loses all changes since last cycle.  Recall that it was you who
introduced the word "replication", in the context of guaranteeing no
loss of data.  Then you ignored David's point about the relatively low
speed of networks, remarking only that mirroring is real-time.  Reading
between your words makes clear that "mirroring and replication" does

You've rather under-valued dependability, though.  Even your idea of
mirroring systems is incomplete, because failure of the principle system
requires transparent fail-over to the redundant system, which is
actually quite challenging, especially with commodity systems hobbled
together in the way you promote.  Remember that you claimed
microsecond-level transaction times, and 6-nines of availability.  The
former seems unlikely with replicated systems and, in the event of a
failure, you won't achieve the latter.

You still haven't investigated the benefit of your idea over a whopping
great buffer cache.  What's the point in all of this if it turns out, as
Alan hinted should be the case, that a big buffer cache gives much the
same performance?  You appear to have gone to a great deal of effort
without having performed quite simple yet obvious experiments.
--
To: David Newall <davidn@...>
Cc: <david@...>, Alan Cox <alan@...>, Willy Tarreau <w@...>, <linux-kernel@...>
Date: Monday, March 17, 2008 - 4:25 am

You are twisting words.  I may have said that replication provides a
point-in-time copy of a volume, which is exactly what it does, no more,

A big buffer cache does not provide a guarantee that the dirty cache
data saved to disk when line power is lost.  If you would like to
add that feature to the Linux buffer cache, then please do it, or make
whichever other contribution you wish to make.  If you just want to
explain to me one more time that Linux, batteries, whatever, cannot
be relied on, then please do not include me in the CC list.

Daniel
--
To: Daniel Phillips <phillips@...>
Cc: David Newall <davidn@...>, <david@...>, Alan Cox <alan@...>, Willy Tarreau <w@...>, <linux-kernel@...>
Date: Sunday, March 23, 2008 - 5:33 am

on_battery_power:

sync
mount / -oremount sync

...will of course work okay on any reasonable system. Not on yours,
because you have to do

echo i_really_mean_sync_when_i_say_sync &gt; /hidden/file/somewhere
sync

(...which also shows that you are cheating).

Now, will you either do your homework and show that page cache is
somehow unsuitable for your job, or just stop wasting the bandwidth
with useless rants?
							Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To: Pavel Machek <pavel@...>
Cc: David Newall <davidn@...>, <david@...>, Alan Cox <alan@...>, Willy Tarreau <w@...>, <linux-kernel@...>
Date: Sunday, March 23, 2008 - 4:44 pm

Speaking of useless rants...

You need to go read the whole thread again, you missed the main bit.

Daniel
--
To: Daniel Phillips <phillips@...>
Cc: <david@...>, Alan Cox <alan@...>, Willy Tarreau <w@...>, <linux-kernel@...>
Date: Monday, March 17, 2008 - 2:56 pm

You said that you could achieve a certain performance, and later you
said that for reliability you could use mirroring and replication but
you never said that would lead to a performance hit.  In fact you don't
seem to be able to offer performance AND robustness; for performance you
can only offer that level of robustness attainable on a single system,
which means I think even you agreed was really not up to snuff for
But the filesystem does offer a minimum level of consistency, which is
missing from what you propose.  You propose writing nothing unless
line-power fails.  The big buffer cache gives you all of the robustness
of the underlying filesystem and including dirty buffer writes at some

I haven't said that at all, other than as an axiom (which even you have
agreed is fair) leading to comments on the results when something does
fail.  You keep saying that it won't ever fail, then that it will but
that you can mitigate using redundant systems; and then you gloss over
or refuse to face the attendant performance hit.  Finally, you still
have no idea whether your idea really does achieve a massive performance
boost.  You've never compared like amounts of RAM, nor the unsynced
updates that most closely resemble your idea.  In short, you've leaped
on what seems to you to be a good idea and steadfastly refused to
conduct even basic research.  What's the point?

You say don't cc you; I say go away, do that basic research, and come
back when you have hard data.  I really don't think you can ask for
fairer than that.
--
To: Daniel Phillips <phillips@...>
Cc: David Newall <davidn@...>, Alan Cox <alan@...>, Willy Tarreau <w@...>, <linux-kernel@...>
Date: Monday, March 17, 2008 - 2:49 am

so just mirror to a local disk array then.

a local disk array has more write bandwidth than a network connection to a 
remote machine, so if you can mirror to a remote machine you can mirror to 


if by traditionalists you mean everyone who makes a living keeping systems 
running you are right. we want sane failure modes as much as we want 
performance.

there will be times when we decide to go for speed at the expense of 
safety, but we want to do it knowingly, not when someone is promising both 
and only provides speed.

and by the way, if the violin box use your software they have just moved 
from a resource for me to tap when needed to something that I will advise 
my company to avoid at all costs.

David Lang
--
To: <david@...>
Cc: David Newall <davidn@...>, Alan Cox <alan@...>, Willy Tarreau <w@...>, <linux-kernel@...>
Date: Monday, March 17, 2008 - 4:16 am

Great idea.  Except that the disk array has millisecond level latency,

So you could potentially connect to a _huge_ disk array and write deltas
to it.  The disk array would have to support roughly 3 Gbytes/second of
write bandwidth to keep up with the Violin ramdisk.  Doable, but you are
now in the serious heavy iron zone.

Personally, I like my nice simple design a lot more.  Just mirror it, as
many times as you need to satisfy your paranoia.  Or how about go write
your own?

Daniel
--
To: Daniel Phillips <phillips@...>
Cc: David Newall <davidn@...>, Alan Cox <alan@...>, Willy Tarreau <w@...>, <linux-kernel@...>
Date: Monday, March 17, 2008 - 10:42 am

your network will do less then 1 Gbit/sec, so to mirror in real-time (what 
you claim is trivial) you would need at least 24 network connections in 
parallel. that's a LOT harder to setup then a high performance disk array.

David Lang
--
To: Daniel Phillips <phillips@...>
Cc: David Newall <davidn@...>, Alan Cox <alan@...>, Willy Tarreau <w@...>, <linux-kernel@...>
Date: Monday, March 17, 2008 - 1:23 pm

by the way, the only way to get this much bandwideth between two machines 
is to directly connect PCI-e/16 card slots togeather. this is definantly 
not commodity hardware anymore (if it's even possible, PCI-e has some very 
short distance limitations)

David Lang
--
To: <david@...>
Cc: Daniel Phillips <phillips@...>, David Newall <davidn@...>, Alan Cox <alan@...>, <linux-kernel@...>
Date: Monday, March 17, 2008 - 1:30 pm

You can do that with 3 10GE NICS, though in practise that's not easy. 

Willy

--
To: Daniel Phillips <phillips@...>
Cc: <david@...>, David Newall <davidn@...>, Alan Cox <alan@...>, Willy Tarreau <w@...>, <linux-kernel@...>
Date: Monday, March 17, 2008 - 9:52 am

Just a point of information, most of the mid-tier and above disk arrays 
can do replication/mirroring behind the scene (i.e., you write to one 
array and it takes care of replicating your write to one or more other 
arrays).  This behind the scene replication can be over various types of 
connections - IP or fibre channel probably are the two most common paths.

That will still leave you with the normal latency for a small write to 
an array which is (when you hit cache) order of 1-2 ms...

ric
--
To: Daniel Phillips <phillips@...>
Cc: <david@...>, David Newall <davidn@...>, Willy Tarreau <w@...>, <linux-kernel@...>
Date: Monday, March 17, 2008 - 6:39 am

So we've all noticed

Alan
--
To: Daniel Phillips <phillips@...>
Cc: Willy Tarreau <w@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 5:55 pm

You only have to care about ordering if there is a store barrier between
the two (not usual). You only have to care about filling if you generate
enough dirty blocks at a very high rate (which is unusual for most
workloads). If you don't care about those then we have ramdisk already and
if you want to write a ramdisk driver for external ramdisk great. You'd
also fix the layering violations then by allowing device mapper to
implement things like snapshotting and writeback seperated from your
driver.

Even in the extreme case that you propose there are trivial ways of
getting coherency. Simple example - if you can sweep all the data out in
say 10 minutes then you can buy twice the physical media and ensure that
one of the two sets of disk backups is genuinely store barrier consistent
to some snapshot time (say every 30 minutes but obviously user tunable).
If you at least had some kind of credible snapshotting you'd find people

Stable storage to most people means "won't go away on a bad happening".
Transaction likewise has a specific meaning in terms of an event occuring
once only an either being recorded before or after the transaction
occurred.

Alan
--
To: Alan Cox <alan@...>
Cc: Willy Tarreau <w@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 6:36 pm

Hi Alan,


According to you.  A more accurate statement: if you have the ramdisk
on the host, then the host is assumed to be reliable.  If the ramdisk
is external (http://www.violin-memory.com/products/violin1010.html)
then your statement is untrue in every sense.

But you did not address the logic of my statement above: that your
fundamental design prevents you from operating at ramdisk speed during

No wait, it is completely normal.  There is a barrier on every journal



Exactly the purpose for which this driver was written.  And as a bonus
it happens to be useful for internal ramdisk applications as well.  (It

Device mapper already can, so I do not get your point.  Also, what is


Hostility does not equate to accuracy.  Galileo comes to mind.

I see people arguing that a server+linux+batteries+mirroring+replication
cannot achieve enterprise grade reliability.  Balderdash.

Regards,

Daniel
--
To: Daniel Phillips <phillips@...>
Cc: Willy Tarreau <w@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 6:46 pm

&gt; Hostility does not equate to accuracy.  Galileo comes to mind.

I see no attempt to even discuss the use of two sets
of physical storage to maintain coherent snapshots, just comments about
hostility. That's a fairly poor way to repay people who spend a lot of
time working with enterprise customers and are interested in solutions
using things like giant ramdisks and are putting in time to discuss

I look forward to seeing your constructive detailed analysis of failure
modes based upon actual statistical data from real data centres. Unless
you can produce that nobody is going to take you seriously, which is bad
luck for the poor folks at violin if they are relying on you.

Alan
--
To: Alan Cox <alan@...>
Cc: Willy Tarreau <w@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Sunday, March 16, 2008 - 7:39 pm

[Empty message]
To: Daniel Phillips <phillips@...>
Cc: Willy Tarreau <w@...>, David Newall <davidn@...>, <linux-kernel@...>
Date: Monday, March 17, 2008 - 7:53 am

&gt; You did not explain how your proposal will avoid dropping the transaction


Here is a simple but high physical storage using approach (but hey disks
are cheap)

You walk across the ram dirty table writing out chunks to backing
store 0.

At some point in time you want a consistent snapshot so you pick the next
write barrier point after this time and begin committing blocks dirtied
after that moment to store 1 (with blocks before that moment being
written to both). You don't permit more than one snapshot to be in
progress at once so at some point you clear all the blocks for store 0.
Your snapshotting interval is bounded by the time to write out the store,
nor do you have to throttle writes to the ramdisk.

You now have a consistent snapshot in store 0. At the next time interval
we finish off store 1 and spew new blocks to store 2, after 2 is complete
we go with 2, 0 and then 1 as the stable store.

The only other real trick needed then is metadata, but you don't have to
update that on disk too often and you only need two bits for each of the
page in RAM.

For any page it is either

00	Clean on stable store
01	Clean on current writing snapshot
10	Dirty on stable store (and thus both)
11	Dirty on current writing snapshot (but clean, old on stable)

Pages go 00-&gt;11 or 01-&gt;11 when they are touched, 11-&gt;01 or 10-&gt;01 when
they are written back.

At the point we freeze a snapshot we move 01-&gt;00 11-&gt;10 00-&gt;11 and there
are no pages in 10. And of course we don't update the big tables at this
instant instead we store the page state as

		(value - cycle_count)&amp;3

with each freeze moment doing

		cycle_count++;

The 00-&gt;11 is perhaps not obvious but the logic is fairly simple. The
snapshot we are building does not magically contain the stable data from
a prev