Looks to me like more and more things are using the block discard
functionality, and as predicted it is slowing things down enormously.
The problem is that we still only discard tiny bits (a single range still??)
per TRIM command, rather than batching larger ranges and larger numbers
of ranges into single TRIM commands.
That's a very poor implementation, especially when things start enabling
it by default. Eg. the swap code, mke2fs, etc..
Ugh.
--