login
Header Space

 
 

Quote: Memory Is Getting Relatively Cheap

November 15, 2007 - 10:06am
Submitted by Jeremy on November 15, 2007 - 10:06am.

"Memory is getting relatively cheap these days --- we're talking maybe US$30 to US$40 per megabyte if your machine can take SIMMS. Upgrading a machine from 2 meg to 4 meg doesn't cost *that* much money."

— Theodore Tso', in a November 15th, 1991 message on the Linux Activists mailing list.

God, I remember paying $1000

November 15, 2007 - 10:56am
Anonymous (not verified)

God, I remember paying $1000 for one megabyte of memory.

Now you can get 2 gigabytes for $30.

they ripped you off ;)

November 15, 2007 - 6:51pm
mangoo (not verified)

they ripped you off, certainly ;)

Cheap, but slow

November 16, 2007 - 1:47am
Lawrence D'oliveiro (not verified)

How things have changed. Memory is cheaper than ever, but it's been outstripped by processor speeds. That means it's become more expensive, in terms of processor cycles, to actually access that memory. So you introduce various levels of caches to try to reduce the cost of memory accesses, but they're still not enough--you have to change the way you write code to take full advantage of the speed of today's processors. That means, for instance, not using lookup tables of precomputed values, because it has become faster to recompute those values whenever needed, than to access the memory containing the tables.

Bandwidth is pretty good; latency sucks.

November 16, 2007 - 2:59am

It really depends on what you're looking up. Modern L1s on AMD and Intel parts still have a 3-cycle load-to-use latency, so there are many things you can look up that will still be faster than a computation if you're looking many things up. I use lookup tables for bit-reversal and bit-expansion to great effect in my code. Often times the latency involved can get hidden. Pointer chasing, on the other hand, is painful.

I have to deal with this at work. The DSP I work with has a 5 cycle latency on its load instructions, and that's when it *hits* L1 memory. So, a loop which reads a linked list can run no faster than 5 cycles per iteration. (It's an 8-issue VLIW DSP with predication, though, so you can at least do something with each list element in that time.) It's painful, though, to watch the DSP sit there twiddling its thumbs when someone writes a->b->c->d->e.

Going to what you said about recomputing things, though... one positive side is that algorithms HAVE gotten more computationally intense. That is, the amount of math you have to do on each bit has gone up, such that the ratio of compute to bandwidth helps hide the growing gap between CPU speed and memory speed.

--
Program Intellivision and play Space Patrol!

Blackfin

November 16, 2007 - 11:24am
Anonymous (not verified)

There is no latency on the Blackfin for L1 reads. Carefully relocating code and data to it can produce spectacular results.

Not really

November 16, 2007 - 5:34pm

Blackfin has a 3 cycle load latency. Their assembly syntax hides it, but it's there. Address generation is in stage 5 of the pipeline and loaded data is available for use in stage 8. If you try to chase down a linked list without sufficient delay between your load instructions, you'll incur a bunch of stalls. Same with IIR computation. Take a look at slide 12:

http://www.analog.com/processors/pdf/bold/Prog_Opt_C_Code_on_Blackfin_slides.pdf

Judging from the pipeline, it looks like things could get really ugly if you had a store with a subsequent dependent load. I don't have one so I can't measure it.

--
Program Intellivision and play Space Patrol!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
speck-geostationary