The AVR32 versions of readsb/writesb didn't look to me as if they'd
be quite as fast as the ARM ones either. If AVR32 has some analogue
of "stmia r1!, {r3 - r6}" for burst 16 byte stores, it's not using
it right now. (What was the bug you found in its readsb?)
Yes, I'd think the win would be most visible with hardware ECC, since
without it you've still got a second manual scan of each block. (And
I see you observed this too, after applying a workaround for an ECC
erratum you just learned about...) My numbers for one pair of trials
(the "16%" was an average of 6 runs) had a *lot* less system time.
Which oddly enough went *up* after the switch to readsb/writesb:
Before:
real 0m24.199s
user 0m0.000s
sys 0m5.630s
After:
real 0m20.226s
user 0m0.010s
sys 0m6.000s
However, the fact that you got a win even with soft ECC (and, I'm
guessing, slower RAM and slower readsb) suggests that this speedup
should be pretty generally applicable!
I wouldn't know. Just be sure not to lose all your badblocks data
when you convert ...
It's another one of those cases where the framework overhead has to be
low enough to make that practical. Last time I looked, the overhead to
set up and wait for a DMA of a couple KBytes was a significant chunk of
the cost to readsb()/writesb() the same data ... and that's even before
the data starts transferring.
Plus, the MTD layer currently assumes DMA is never used. Some of the
buffers it passes are not suitable for dma_map_single() since they
come from vmalloc.
Sounds fair to me. Thanks; this has been sitting in my tree for many
months now, I finally made time to measure it and was pleasantly
surprised by the size of the win!
- Dave
--