Pretty much all CPUs word align on a 8 byte bounder (until we get those
128bit boxes running), but not all can word align on 4 bytes. I was hoping
to make the buffer output somewhat the same across archs.
Otherwise, this is going to be quite a complex mess IMHO.
-- Steve
--